Skip to content

Error with pivot_table and categorical data when add dropna args in version 0.23 #21133

Closed
@diegogarcilazo

Description

@diegogarcilazo
pd.__version__
'0.23.0'

labels = ["< 1 mes","1 a 11 meses","1 a 5 años","6 a 11 años","12 a 17 años","18 a + años","alli"]

df = pd.DataFrame(
    {'a': pd.cut(range(0,50), bins = [0,5,10,20,30,40,50,60], labels = labels, right = True),
     'b':(' '.join(['One'] * 20) + ' ' +' '.join(['Two'] * 30)).split(), 'c': range(10,60)})

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

output
               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50


df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

output

               c
a               
NaN            5
< 1 mes        5
1 a 11 meses  10
1 a 5 años    10
6 a 11 años   10
12 a 17 años   9
18 a + años    0
Total         49

pd.__version__
'0.22.0'

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = False)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         50

df.pivot_table(index='a', aggfunc = 'count', fill_value = 0\
            , margins = True, margins_name = 'Total', values = 'c',dropna = True)

               c
a               
< 1 mes        5
1 a 11 meses   5
1 a 5 años    10
6 a 11 años   10
12 a 17 años  10
18 a + años    9
alli           0
Total         49

When entering the argument dropna = False with the function pivot_table the value and order of the categories is correct. But when dropna = True both the order and the value of the categories do not perform as expected, add a NaN category and drop last category. With other dataset w/o NaN values dropna = True causes categories lose coherence with the values.

The same goes for the pd.crosstab () function

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: Spanish_Argentina.1252

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.3.1
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

CategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions