Skip to content

DataFrame.describe excludes top and freq for empty DataFrame #26397

Closed
@TomAugspurger

Description

@TomAugspurger

Right now, the output shape and dtype of DataFrame.describe for object columns depends on whether the DataFrame is empty.

Code Sample, a copy-pastable example if possible

In [75]: x = pd.DataFrame({"A": ['a', np.nan, np.nan]})

In [76]: x.describe()
Out[76]:
        A
count   1
unique  1
top     a
freq    1

In [77]: x.iloc[:0].describe()
Out[77]:
        A
count   0
unique  0

Problem description

This leads to instability in the output dtypes and shape.

Would people prefer that we use np.NaN or None for the top and freq in this case? I believe there's no ambiguity, since we drop missing values before computing.

While the output consistency would be nice, it's not clear to me what's actually best for users here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Numeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions