Open
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({"col": list(range(100))})
quantiles = pd.qcut(df["col"], 11)
print(quantiles.groupby(quantiles).size())
prints
col
(-0.001, 9.0] 10
(9.0, 18.0] 9
(18.0, 27.0] 8
(27.0, 36.0] 10
(36.0, 45.0] 9
(45.0, 54.0] 8
(54.0, 63.0] 10
(63.0, 72.0] 9
(72.0, 81.0] 9
(81.0, 90.0] 9
(90.0, 99.0] 9
Name: col, dtype: int64
Problem description
qcut
isn't distributing values across bins quite right for this case – for range(100)
there are 10 values in the interval (-0.001, 9.0]
and 9 in all the others.
Expected Output
col
(-0.001, 9.0] 10
(9.0, 18.0] 9
(18.0, 27.0] 9
(27.0, 36.0] 9
(36.0, 45.0] 9
(45.0, 54.0] 9
(54.0, 63.0] 9
(63.0, 72.0] 9
(72.0, 81.0] 9
(81.0, 90.0] 9
(90.0, 99.0] 9
Name: col, dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
Apologies if this has been posted already, but I didn't see anything from searching around.