Skip to content

Quantile fails when only NaNs on some rows/columns #15460

Closed
@jluttine

Description

@jluttine

Quantile method on DataFrame fails when there are NaNs in some specific way. Some cases it handles properly. I guess it's somehow related to cases when some rows and/or columns have only NaNs.

First, some working examples:

>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    2.0
b    NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0    1.0
1    2.0
2    3.0
Name: 0.5, dtype: float64

Code Sample, a copy-pastable example if possible

If I change the second element on the first column to NaN, it all breaks. Now there's one row and one column with only NaNs:

>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    1.0
b    3.0
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-a62c307ef5d1> in <module>()
----> 1 pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)

/.../lib/python3.5/site-packages/pandas/core/frame.py in quantile(self, q, axis, numeric_only, interpolation)
   5153                                      axis=1,
   5154                                      interpolation=interpolation,
-> 5155                                      transposed=is_transposed)
   5156 
   5157         if result.ndim == 2:

/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, **kwargs)
   3142 
   3143     def quantile(self, **kwargs):
-> 3144         return self.reduction('quantile', **kwargs)
   3145 
   3146     def setitem(self, **kwargs):

/.../lib/python3.5/site-packages/pandas/core/internals.py in reduction(self, f, axis, consolidate, transposed, **kwargs)
   3071         for b in self.blocks:
   3072             kwargs['mgr'] = self
-> 3073             axe, block = getattr(b, f)(axis=axis, **kwargs)
   3074 
   3075             axes.append(axe)

/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, qs, interpolation, axis, mgr)
   1325             values = _block_shape(values[~mask], ndim=self.ndim)
   1326             if self.ndim > 1:
-> 1327                 values = values.reshape(result_shape)
   1328 
   1329         from pandas import Float64Index

ValueError: total size of new array must be unchanged

Problem description

In the first case, the quantiles are incorrect and in the second case, the quantile computation should not raise an error.

Expected Output

>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a    2.0
b    NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0    1.0
1    NaN
2    3.0

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.9-gnu-1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0.post20161106
Cython: None
numpy: 1.11.1
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestNumeric OperationsArithmetic, Comparison, and Logical operationsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions