Description
Code Sample, a copy-pastable example if possible
from decimal import *
import pandas as pd
df = pd.DataFrame(
{'name': ['foo', 'bar', 'foo', 'bar'],
'title': ['boo', 'far', 'boo', 'far'],
'id': [123, 456, 123, 456],
'int_column': [1, 2, 3, 4],
'dec_column1': [Decimal('0.50'), Decimal('0.15'), Decimal('0.25'), Decimal('0.40')],
'dec_column2': [Decimal('0.20'), Decimal('0.30'), Decimal('0.55'), Decimal('0.60')]
},
columns=['name','title','id','int_column','dec_column1','dec_column2']
)
df.groupby(['name', 'title', 'id'], as_index=False).sum()
Expected Output
i have dataframe that looks something like this...
| name | title | id | int_column | dec_column1 | dec_column2 |
...that has multiple rows with the same name, title, and id, but different values for the 3 number columns (int_column, dec_column1, dec_column2).
int_column == column of integers
dec_column1 == column of decimals
dec_column2 == column of decimals
I would like to be able to groupby the first three columns, and sum the last 3. I would expect to be able to do the following:
df = df.groupby(['name', 'title', 'id'], as_index=False).sum()
however, the only column that gets summed and ends up in the final dataframe is the int_column.
| name | title | id | int_column |
if i explicitly name the columns, i can get the statement to target the decimal columns either on their own or together....
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1'].sum()
returns...
| name | title | id | dec_column1 |
and...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'dec_column2'].sum()
returns...
| name | title | id | dec_column1 | dec_column1 |
however...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'dec_column2', 'user_num'].sum()
or...
df = df.groupby(['name', 'title', 'id'], as_index=False)['dec_column1', 'user_num', 'dec_column2'].sum()
or...
df = df.groupby(['name', 'title', 'id'], as_index=False)['user_num', 'dec_column1', 'dec_column2'].sum()
returns...
| name | title | id | int_column |
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.15.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.11.1
scipy: None
statsmodels: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.3.5
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: 0.7.5.None
psycopg2: 2.5.5 (dt dec pq3 ext)