Skip to content

Multiindex slicing with NaNs, unexpected results #25154

Closed
@tunnij

Description

@tunnij

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame(
    pd.np.random.rand(2, 3), 
    columns=pd.MultiIndex.from_tuples([('a', 'foo'), ('b', 'bar'), ('b', pd.np.nan)], names=['first','second'])
)
# EXPECTED slicing everything on first level
df.loc[:, (['a', 'b'])]
Out[35]: 
first          a         b          
second       foo       bar       NaN
0       0.678021  0.383672  0.074164
1       0.738492  0.992545  0.661247

# EXPECTED just slicing one value from first level
df.loc[:, (['b'])]
Out[29]: 
first          b          
second       bar       NaN
0       0.383672  0.074164
1       0.992545  0.661247

# EXPECTED slicing out b, bar
df.loc[:, (['b'], ['bar'])]
Out[33]: 
first          b
second       bar
0       0.383672
1       0.992545

# UNEXPECTED slicing out b, nan
df.loc[:, (['b'], [pd.np.nan])]
Out[36]: 
Empty DataFrame
Columns: []
Index: [0, 1]

# UNEXPECTED slicing out b, [nan, 'bar']
df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[39]: 
first          b
second       bar
0       0.383672
1       0.992545

# EXPECTED slicing out b, nan without the index
df.loc[:, ('b', pd.np.nan)]
Out[37]: 
0    0.074164
1    0.661247
Name: (b, nan), dtype: float64

Problem description

When trying to slice out multiple values from a particular level including levels with a nan value, the levels with nan are not retrieved.

Expected Output

Both of these I expect to work:

df.loc[:, (['b'], ['bar', pd.np.nan])]
Out[40]: 
first          b          
second       bar       NaN
0       0.383672  0.074164
1       0.992545  0.661247

df.loc[:, (['b'], [pd.np.nan])]
Out[40]: 
first          b          
second       NaN
0       0.074164
1       0.661247

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.36.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.10.0
pip: 18.1
setuptools: 40.5.0
Cython: 0.28.5
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: 0.10.9
IPython: 5.8.0
sphinx: 1.8.1
patsy: 0.5.1
dateutil: 2.7.2
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.7
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions