Skip to content

List required for percentiles kwarg in DataFrame.describe when median is not present as opposed to array-like #14908

Closed
@pbreach

Description

@pbreach

Code Sample, a copy-pastable example if possible


In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.random((1000, 4)))

In [4]: percentiles = np.linspace(0, 0.99, 10)

In [5]: df.describe(percentiles=percentiles)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-83616b318eba> in <module>()
----> 1 df.describe(percentiles=percentiles)

C:\Users\pbreach\Anaconda3\lib\site-packages\pandas\core\generic.py in describe(self, percentiles, include, exclude)
   5194             # median should always be included
   5195             if 0.5 not in percentiles:
-> 5196                 percentiles.append(0.5)
   5197             percentiles = np.asarray(percentiles)
   5198         else:

AttributeError: 'numpy.ndarray' object has no attribute 'append'

Problem description

In the documentation the kwarg percentiles is expecting an array-like input, however when passing in a numpy array, an attribute error is thrown as if it were expecting a list. If a list is being expected in the case that the median is not found should there be an explicit conversion to list before the median is appended?

Expected Output

In [6]: df.describe(percentiles=list(percentiles))
Out[6]:
                 0            1            2            3
count  1000.000000  1000.000000  1000.000000  1000.000000
mean      0.500730     0.501185     0.498594     0.498648
std       0.289616     0.286023     0.290509     0.292264
min       0.001290     0.000822     0.000459     0.001975
0%        0.001290     0.000822     0.000459     0.001975
11%       0.119319     0.124990     0.107683     0.114136
22%       0.211321     0.232740     0.209913     0.227046
33%       0.331405     0.325820     0.336409     0.311294
44%       0.439314     0.446085     0.443036     0.431923
50%       0.500374     0.505759     0.499125     0.491579
55.0%     0.553634     0.552899     0.552896     0.544990
66%       0.666159     0.647926     0.661797     0.661387
77%       0.777984     0.774892     0.776067     0.773342
88%       0.883761     0.874293     0.872860     0.884350
99%       0.985795     0.989234     0.991083     0.993646
max       0.998623     0.999924     0.999723     0.999185

Output of pd.show_versions()

In [7]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions