Closed
Description
xref #8731, soln might be the same
Currently when we do a crosstab
, the distinct values in each column is reported in the lexical order. But crosstabs are usually useful when we have categorical data (that may have an inherent ordering).
import pandas as pd
d = {'MAKE' : pd.Series(['Honda', 'Acura', 'Tesla', 'Honda', 'Honda', 'Acura']),
'MODEL' : pd.Series(['Sedan', 'Sedan', 'Electric', 'Pickup', 'Sedan', 'Sedan'])}
data = pd.DataFrame(d)
pd.crosstab(data['MAKE'],data['MODEL'])
data['MODEL'] = data['MODEL'].astype('category')
data['MODEL'] = data['MODEL'].cat.set_categories(['Sedan','Electric','Pickup'])
pd.crosstab(data['MAKE'],data['MODEL'])
Both the cross-tab statements above result in the same output as below - essentially the code I believe is performing a lexical sort on the contents of the Series
being passed.
Output:
MODEL Electric Pickup Sedan
MAKE
Acura 0 0 2
Honda 0 1 2
Tesla 1 0 0
Would it be possible for crosstab
to maintain the ordering of the categorical variable if column.cat.ordered on the passed column is True? Thanks!