Skip to content

qcut: Option to return -inf/inf as lower/upper bound #17282

Closed
dberenbaum/pandas
#4
@prcastro

Description

@prcastro

Code Sample

>>> x = pd.qcut([2,3,4,5,6,7,8,9], q=5, duplicates='drop')
>>> x.categories
IntervalIndex([(1.999, 3.4], (3.4, 4.8], (4.8, 6.2], (6.2, 7.6], (7.6, 9.0]]
              closed='right',
              dtype='interval[float64]')

>>> pd.cut([1,5,6,10], x.categories)
[NaN, (4.8, 6.2], (4.8, 6.2], NaN]
Categories (5, interval[float64]): [(1.999, 3.4] < (3.4, 4.8] < (4.8, 6.2] < (6.2, 7.6] < (7.6, 9.0]]

Problem description

I'm currently using qcut and cut together for Machine Learning. I use qcut to cut training data into quantiles and use cut to cut the test data into the same bins.

However, if a value in the test data is too high/low, it will violate the Categories created by qcut, and the resulting category will be NaN. A solution to this is to create an option to return -inf/inf as lower/upper bound of the categories.

Expected Output

>>> x = pd.qcut([2,3,4,5,6,7,8,9], q=5, duplicates='drop', inf_bounds=True)
>>> x.categories
IntervalIndex([(-inf, 3.4], (3.4, 4.8], (4.8, 6.2], (6.2, 7.6], (7.6, inf)]
              closed='right',
              dtype='interval[float64]')
>>> pd.cut([1,5,6,10], x.categories)
[(-inf, 3.4], (4.8, 6.2], (4.8, 6.2], (7.6, inf)]
Categories (5, interval[float64]): [(-inf, 3.4] < (3.4, 4.8] < (4.8, 6.2] < (6.2, 7.6] < (7.6, inf)]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.26.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions