Description
Quantile method on DataFrame fails when there are NaNs in some specific way. Some cases it handles properly. I guess it's somehow related to cases when some rows and/or columns have only NaNs.
First, some working examples:
>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a 2.0
b NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, 2, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0 1.0
1 2.0
2 3.0
Name: 0.5, dtype: float64
Code Sample, a copy-pastable example if possible
If I change the second element on the first column to NaN, it all breaks. Now there's one row and one column with only NaNs:
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a 1.0
b 3.0
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-a62c307ef5d1> in <module>()
----> 1 pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
/.../lib/python3.5/site-packages/pandas/core/frame.py in quantile(self, q, axis, numeric_only, interpolation)
5153 axis=1,
5154 interpolation=interpolation,
-> 5155 transposed=is_transposed)
5156
5157 if result.ndim == 2:
/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, **kwargs)
3142
3143 def quantile(self, **kwargs):
-> 3144 return self.reduction('quantile', **kwargs)
3145
3146 def setitem(self, **kwargs):
/.../lib/python3.5/site-packages/pandas/core/internals.py in reduction(self, f, axis, consolidate, transposed, **kwargs)
3071 for b in self.blocks:
3072 kwargs['mgr'] = self
-> 3073 axe, block = getattr(b, f)(axis=axis, **kwargs)
3074
3075 axes.append(axe)
/.../lib/python3.5/site-packages/pandas/core/internals.py in quantile(self, qs, interpolation, axis, mgr)
1325 values = _block_shape(values[~mask], ndim=self.ndim)
1326 if self.ndim > 1:
-> 1327 values = values.reshape(result_shape)
1328
1329 from pandas import Float64Index
ValueError: total size of new array must be unchanged
Problem description
In the first case, the quantiles are incorrect and in the second case, the quantile computation should not raise an error.
Expected Output
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=0)
a 2.0
b NaN
Name: 0.5, dtype: float64
>>> pd.DataFrame({"a": [1, np.nan, 3], "b": [np.nan, np.nan, np.nan]}).quantile(axis=1)
0 1.0
1 NaN
2 3.0
Output of pd.show_versions()
pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0.post20161106
Cython: None
numpy: 1.11.1
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.9.4
boto: None
pandas_datareader: None