Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"temp_playlist": [0, 0, 0, 0],
"objId": ["o1", np.nan, "o1", np.nan],
"x": [1, 2, 3, 4],
}
)
agg_df = df.groupby(by=['temp_playlist', 'objId'], dropna=False)["x"].agg(list)
print(agg_df.loc[agg_df.index[-1]]) # KeyError: because it is (0, np.nan), wanted to get [2, 4]
Issue Description
This issue is a follow-up of the discussion in this SO question.
It appears to be a bug, but if not, meaning, if this is desired behavior it should be documented.
As shown in the Reproducible Example
, after grouping x
data on the temp_playlist
and objId
columns, there is a MultiIndex (0, nan)
. This index is meaningful and I wanted to access the data via it as I can perform with any other index from agg_df.index
as agg_df.loc[<index_pos>]
. This is not possible for the index containing the nan
(agg_info_df.loc[agg_info_df.index[-1]]
). However, it does work if that same index is provided in a list of indices. So this seems at least inconsistent if not a bug entirely.
For more info, please consult the SO question, especially this answer.
Expected Behavior
agg_info_df.loc[(0, np.nan)]
should return [2, 4]
Installed Versions
python 3.8.5, pandas 1.3.1, numpy 1.20.3