Skip to content

GroupBySeries Quantile fails when there are 3 or more categories #28312

Closed
@josesho

Description

@josesho

Code Sample

Using the latest version of pandas v0.25.1

import numpy as np
import pandas as pd

np.random.seed(12345)

df1 = pd.DataFrame({
    'category': ['A', 'A', 'A', 'A', 
                 'B', 'B', 'B', 'B', 
                 ],
    'value': np.random.randint(1, 10, 8)
})

df1.groupby("category").value.quantile([0.25, 0.75])

produces

category      
A         0.25    2.75
          0.75    5.25
B         0.25    2.75
          0.75    6.25
Name: value, dtype: float64

as expected. However, running this

np.random.seed(12345)

df2 = pd.DataFrame({
    'category': ['A', 'A', 'A', 'A', 
                 'B', 'B', 'B', 'B', 
                 'C', 'C', 'C', 'C', 
                 ],
    'value': np.random.randint(1, 10, 12)
})

df2.groupby("category").value.quantile([0.25, 0.75])

produces this error instead:

IndexError                                Traceback (most recent call last)
<ipython-input-60-12c4dbb665fc> in <module>
      8 })
      9 
---> 10 df2.groupby("category").value.quantile([0.25, 0.75])

~/anaconda3/envs/dabest-dev-py3.7/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1951             indices = np.concatenate(arrays)
   1952             assert len(indices) == len(result)
-> 1953             return result.take(indices)
   1954 
   1955     @Substitution(name="groupby")

~/anaconda3/envs/dabest-dev-py3.7/lib/python3.7/site-packages/pandas/core/series.py in take(self, indices, axis, is_copy, **kwargs)
   4430 
   4431         indices = ensure_platform_int(indices)
-> 4432         new_index = self.index.take(indices)
   4433 
   4434         if is_categorical_dtype(self):

~/anaconda3/envs/dabest-dev-py3.7/lib/python3.7/site-packages/pandas/core/indexes/multi.py in take(self, indices, axis, allow_fill, fill_value, **kwargs)
   2030             allow_fill=allow_fill,
   2031             fill_value=fill_value,
-> 2032             na_value=-1,
   2033         )
   2034         return MultiIndex(

~/anaconda3/envs/dabest-dev-py3.7/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _assert_take_fillable(self, values, indices, allow_fill, fill_value, na_value)
   2058                 taken = masked
   2059         else:
-> 2060             taken = [lab.take(indices) for lab in self.codes]
   2061         return taken
   2062 

~/anaconda3/envs/dabest-dev-py3.7/lib/python3.7/site-packages/pandas/core/indexes/multi.py in <listcomp>(.0)
   2058                 taken = masked
   2059         else:
-> 2060             taken = [lab.take(indices) for lab in self.codes]
   2061         return taken
   2062 

IndexError: index 6 is out of bounds for size 6

The expected output is produced with pandas=0.24:

df2.groupby("category").value.quantile([0.25, 0.75])
category      
A         0.25    2.75
          0.75    5.25
B         0.25    2.75
          0.75    6.25
C         0.25    1.75
          0.75    7.25

Not exactly sure how to mitigate this?

I understand a related bug was patched with #28285 and #27526.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None

pandas : 0.25.1
numpy : 1.16.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 4.3.0
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.2.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions