Skip to content

value_counts unexpected behaviour - bins and dropna #25970

Open
@krolikowskib

Description

@krolikowskib

Code Sample, a copy-pastable example if possible

1st example:

Input:

series = pd.Series([1, 1, 2, 0, 1, np.nan, 4, 4, np.nan, 3])
series.value_counts(True, bins=2)

Output:

(-0.005, 2.0]    0.5
(2.0, 4.0]       0.3
dtype: float64

Expected output:

(-0.005, 2.0]    0.625
(2.0, 4.0]       0.375
dtype: float64

2nd example:

Input:

series = pd.Series([1, 1, 2, 0, 1, np.nan, 4, 4, np.nan, 3])
series.value_counts(True, bins=2, dropna=False)

Output:

(-0.005, 2.0]    0.5
(2.0, 4.0]       0.3
dtype: float64

Expected Output:

(-0.004, 2.0]    0.5
(2.0, 4.0]       0.3
NaN              0.2
dtype: float64

Problem description

dropna argument in value_counts() seems to have no effect when bins is not None. Expected behaviour is better, because it sums up shares up to 1 when dropna=True and shows NaNs when dropna=False.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions