Skip to content

BUG?: getitem with a MultiIndex returns a Series only when the lower level is "" #50805

Closed
@rhshadrach

Description

@rhshadrach

xref #46944

columns1 = pd.MultiIndex.from_tuples([("a", "a2"), ("b", "c")])
df1 = pd.DataFrame([[1, 2]], columns=columns1)
print(df1["a"])
#    a2
# 0   1

columns2 = pd.MultiIndex.from_tuples([("a", ""), ("b", "c")])
df2 = pd.DataFrame([[1, 2]], columns=columns2)
print(df2["a"])
# 0    1
# Name: a, dtype: int64

The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in DataFrameGroupBy._selected_obj and DataFrameGroupBy._obj_with_exclusions which can lead to erroneous results (#50804 is one example).

Currently, df2.groupby("a") is allowed whereas df1.groupby("a") raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.

One can do df1.groupby(("a", "a2")) successfully, so I don't think there is a worry about making certain ops not possible.

cc @phofl for any thoughts

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions