Closed
Description
xref #46944
columns1 = pd.MultiIndex.from_tuples([("a", "a2"), ("b", "c")])
df1 = pd.DataFrame([[1, 2]], columns=columns1)
print(df1["a"])
# a2
# 0 1
columns2 = pd.MultiIndex.from_tuples([("a", ""), ("b", "c")])
df2 = pd.DataFrame([[1, 2]], columns=columns2)
print(df2["a"])
# 0 1
# Name: a, dtype: int64
The first case produces a DataFrame, whereas the second case produces a Series. I don't think this is intentional. This gives rise to a difference in DataFrameGroupBy._selected_obj
and DataFrameGroupBy._obj_with_exclusions
which can lead to erroneous results (#50804 is one example).
Currently, df2.groupby("a")
is allowed whereas df1.groupby("a")
raises. So returning a DataFrame in the 2nd case will resolve the groupby inconsistency as well.
One can do df1.groupby(("a", "a2"))
successfully, so I don't think there is a worry about making certain ops not possible.
cc @phofl for any thoughts