Skip to content

DataFrame aggregation methods inconsistent when object columns have missing values #16832

Closed
@tdpetrou

Description

@tdpetrou

Code Sample, a copy-pastable example if possible

>>> df = pd.DataFrame({'A':['a', None], 'B':['a', 'b']})
>>> df.min()
Series([], dtype: float64)

>>> df[['B', 'A']].min()
B    a
dtype: object

Problem description

Both DataFrames should return the same thing. Order should not matter here. I suppose this is happening based on whether or not the first column has any missing values?

This happens for the other aggregation methods max and sum

Expected Output

For consistency, both should output the same thing.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.3.0.post

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions