Skip to content

DataFrame.describe() output is not deterministic #32528

Open
@ShaharNaveh

Description

@ShaharNaveh

xref #31472

Running this:

pd.DataFrame({
'categorical': pd.Categorical(['d', 'e', 'f']),
'numeric': [1, 2, 3],
'object': ['a', 'b', 'c']
})

df.describe(include="all")

Will sometimes have the output of:

     categorical  numeric object
        count            3      3.0      3
        unique           3      NaN      3
        top              f      NaN      a
        freq             1      NaN      1
        mean           NaN      2.0    NaN
        std            NaN      1.0    NaN
        min            NaN      1.0    NaN
        25%            NaN      1.5    NaN
        50%            NaN      2.0    NaN
        75%            NaN      2.5    NaN
        max            NaN      3.0    NaN

And sometimes the output of:

     categorical  numeric object
        count            3      3.0      3
        unique           3      NaN      3
        top              f      NaN      c
        freq             1      NaN      1
        mean           NaN      2.0    NaN
        std            NaN      1.0    NaN
        min            NaN      1.0    NaN
        25%            NaN      1.5    NaN
        50%            NaN      2.0    NaN
        75%            NaN      2.5    NaN
        max            NaN      3.0    NaN

(Changes at the row of top)

This is making the doctests fail. We should remove the SKIP in the tests in the describe docstring when we identify the problem: https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L9649

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions