Skip to content

BUG: Different initialization methods lead to different dtypes (DataFrame) #42971

Open
@RileyLazarou

Description

@RileyLazarou
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

df1 = pd.DataFrame(columns=["a", "b", "c"])
print(df1.groupby("a").sum().columns)
# => Index([], dtype='object')
df2 = pd.DataFrame({"a": [], "b": [], "c": []})
print(df2.groupby("a").sum().columns)
# => Index(['b', 'c'], dtype='object')

Problem description

groupby-ing and summing an empty dataframe led to dropped columns (df1 above); this doesn't occur with non-empty dataframes. This changing the columns of a dataframe based on its content is counter-intuitive and leads to key errors. The expected behaviour is shown above with df2, and the fact that two empty dataframes show different behaviours when grouped and summed suggests that this isn't intended behaviour.

Expected Output

Output of the above snippet:

Index([], dtype='object')
Index(['b', 'c'], dtype='object')

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorBugConstructorsSeries/DataFrame/Index/pd.array ConstructorsDataFrameDataFrame data structureDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions