Skip to content

'groupby' multiple columns and 'sum' multiple columns with different types #13821

Closed
@pmckelvy1

Description

@pmckelvy1

Code Sample, a copy-pastable example if possible

from decimal import *
import pandas as pd
df = pd.DataFrame(
                  {'name': ['foo', 'bar', 'foo', 'bar'], 
                   'title': ['boo', 'far', 'boo', 'far'], 
                   'id': [123, 456, 123, 456], 
                   'int_column': [1, 2, 3, 4], 
                   'dec_column1': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')], 
                   'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')]
                  },
                  columns=['name','title','id','int_column','dec_column1','dec_column2']
                 )
df.groupby(['name', 'title', 'id'], as_index=False).sum()

Expected Output

i have dataframe that looks something like this...

| name | title | id | int_column | dec_column1 | dec_column2 |

...that has multiple rows with the same name, title, and id, but different values for the 3 number columns (int_column, dec_column1, dec_column2).
int_column == column of integers
dec_column1 == column of decimals
dec_column2 == column of decimals
I would like to be able to groupby the first three columns, and sum the last 3. I would expect to be able to do the following:

df = df.groupby(['name', 'title', 'id'], as_index=False).sum()

however, the only column that gets summed and ends up in the final dataframe is the int_column.

| name | title | id | int_column |

if i explicitly name the columns, i can get the statement to target the decimal columns either on their own or together....

df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1'].sum()
returns...
| name | title | id | dec_column1 |
and...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'dec_column2'].sum()
returns...
| name | title | id | dec_column1 | dec_column1 |
however...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'dec_column2', 'user_num'].sum()
or...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'user_num', 'dec_column2'].sum()
or...
df = df.groupby(['name', 'title', 'id'], as_index=False)['user_num', 'dec_column1', 'dec_column2'].sum()
returns...
| name | title | id | int_column |

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.11.1
scipy: None
statsmodels: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: 0.7.5.None
psycopg2: 2.5.5 (dt dec pq3 ext)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions