Skip to content

describe() returns RuntimeWarning: Invalid value encountered in median RuntimeWarning #13146

Closed
@msure

Description

@msure

I just upgraded to 18.1 w/ conda. I started noticing this problem in some notebooks I created before the upgrade but recently revisited for further analysis.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
df = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78]})

df.value.describe() returns a RuntimeWarning from numpy, which then gives this unexpected result for the quantiles:

In [5]: df.value.describe()
/Users/adrianpalacios/anaconda/lib/python3.4/site-packages/numpy/lib/function_base.py:3403: RuntimeWarning: Invalid value encountered in median
  RuntimeWarning)
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%           NaN
50%           NaN
75%           NaN
max      6.700000
Name: value, dtype: float64

Expected Output

I got this using a different conda environment that has not been upgraded to latest pandas version:

In [5]: df.value.describe()
Out[5]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

Dropping the NaN's works in pandas 18.1:

In [9]: df.value.dropna().describe()
Out[9]:
count    5.000000
mean     4.736000
std      1.480703
min      3.000000
25%      3.780000
50%      4.500000
75%      5.700000
max      6.700000
Name: value, dtype: float64

However, this work-around is not a great option when multiple columns w/ NaNs are present:

df2 = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'],
    'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78],
    'more_values':[8.2,np.nan,np.nan,np.nan,9.4,np.nan,np.nan]
})

In [17]: df2[['value','more_values']].dropna().describe()
Out[17]:
       value  more_values
count    0.0          0.0
mean     NaN          NaN
std      NaN          NaN
min      NaN          NaN
25%      NaN          NaN
50%      NaN          NaN
75%      NaN          NaN
max      NaN          NaN

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions