Skip to content

When running set_index on a categorical to a MultiIndex, it gets coerced to a string.  #15058

Closed
@thequackdaddy

Description

@thequackdaddy

Hello!

I apologize if this expected behavior. This is relatively similar to this StackOverflow question.

Code Sample, a copy-pastable example if possible

import pandas as pd

x = pd.Categorical(['apples', 'dairy', 'chicken', 'beef', 'apples', 'dairy', 'chicken'], categories=['apples', 'dairy', 'beef', 'chicken'])
y = pd.Series([1, 2, 1, 2, 1, 2, 1])
z = pd.Series([3, 4, 2, 1, 3, 2, 1])

df = pd.DataFrame({'z': z, 'x': x, 'y':y})
df.set_index(['x', 'y']).sort_index()
df.sort_values('x')

Problem description

I would like to sort and group-by a column in a custom way. In the example above, I've ordered a categorical (it could be a string) in a way that makes intuitive sense. In this example, I want fruits first, followed by dairy, followed by meats.

Expected Output

When the categorical is in a MultiIndex, set_index seems to coerce the categorical to a string before adding it to the index. It would be nicer if pandas kept the categorical ordering for the index.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.8.0.dev0+7e6b94b
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: 0.7.9.None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions