Skip to content

Maintaining the order of the categorical variable that is passed into pd.crosstab #8860

Closed
@nsriram13

Description

@nsriram13

xref #8731, soln might be the same

Currently when we do a crosstab, the distinct values in each column is reported in the lexical order. But crosstabs are usually useful when we have categorical data (that may have an inherent ordering).

import pandas as pd

d = {'MAKE' : pd.Series(['Honda', 'Acura', 'Tesla', 'Honda', 'Honda', 'Acura']),
'MODEL' : pd.Series(['Sedan', 'Sedan', 'Electric', 'Pickup', 'Sedan', 'Sedan'])}
data = pd.DataFrame(d)
pd.crosstab(data['MAKE'],data['MODEL'])

data['MODEL'] = data['MODEL'].astype('category')
data['MODEL'] = data['MODEL'].cat.set_categories(['Sedan','Electric','Pickup'])
pd.crosstab(data['MAKE'],data['MODEL'])

Both the cross-tab statements above result in the same output as below - essentially the code I believe is performing a lexical sort on the contents of the Series being passed.

Output:
MODEL  Electric  Pickup  Sedan
MAKE                          
Acura         0       0      2
Honda         0       1      2
Tesla         1       0      0

Would it be possible for crosstab to maintain the ordering of the categorical variable if column.cat.ordered on the passed column is True? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions