Skip to content

groupby on multiple columns does not preserve (categorical) dtype #13743

Closed
@martijnvermaat

Description

@martijnvermaat

When doing a groupby on more than one column, the resulting MultiIndex does not seem to preserve the original column dtypes. I noticed it when working with Categorical columns, expecting CategoricalIndex when grouping on them, but this is only the case when grouping on just one column.

I did see that the behaviour was discussed in a PR, but it ultimately was not addressed.

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({
   ...:     'a': pd.Series(list('xyxxyz')).astype('category', categories=list('xyz')),
   ...:     'b': pd.Series(list('yzzyxz')).astype('category', categories=list('xyz')),
   ...:     'c': [1,2,3,4,5,6]
   ...: })

In [3]: df.groupby('a').sum().reset_index().dtypes
Out[3]: 
a    category
c       int64
dtype: object

In [4]: df.groupby(['a', 'b']).sum().reset_index().dtypes
Out[4]: 
a     object
b     object
c    float64
dtype: object

Expected Output

In [4]: df.groupby(['a', 'b']).sum().reset_index().dtypes
Out[4]: 
a    category
b    category
c       int64
dtype: object

output of pd.show_versions()

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.13
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.18.1+240.gbb6b5e5
nose: None
pip: 8.1.2
setuptools: 19.4
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.14
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions