Skip to content

MultiIndex Bug Copying Values Incorrectly When Adding Values To Index #22247

Closed
@JonahJ

Description

@JonahJ

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(
    [
        ['A', np.nan, 1.23, 4.56],
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print("Missing", necessary_value)
        
        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Fails: value of 1.23 from the first row in the df is copied. As of v0.22.0 this was np.nan

Problem description

When using the MultiIndex features of pandas, when an np.nan is in the index when new values are added to the DF then the values are not np.nan, but copied from the np.nan row.

This behavior shows for all versions v0.23.x, however is fine for 0.22.0.

Expected Output

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass, works in v0.22.0

Note this unexpected behavior does not show when the np.nan is not included in the index, nor for a single Index.

MultiIndex without np.nan
df = pd.DataFrame(
    [
        #['A', np.nan, 1.23, 4.56],  # Comment out the np.nan
        ['A', 'G', 1.23, 4.56],
        ['A', 'D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)

pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
    if necessary_value not in df.index.get_level_values('pivot_1').tolist():
        print "Missing", necessary_value
        
        df.at[(pivot_0, necessary_value), 'col_2'] = 0.0

assert df.loc[('A', 'F')]['col_2'] == 0.0  # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1'])  # Pass
Single Index with np.nan
df = pd.DataFrame(
    [
        [np.nan, 1.23, 4.56],
        ['G', 1.23, 4.56],
        ['D', 9.87, 10.54],
    ],
    columns=['pivot_0', 'col_1', 'col_2'],
)
df.set_index(['pivot_0'], inplace=True)

necessary_pivot_0_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_0_values:
    if necessary_value not in df.index.get_level_values('pivot_0').tolist():
        print "Missing", necessary_value
        
        df.at[(necessary_value), 'col_2'] = 0.0

assert df.loc[('F')]['col_2'] == 0.0
assert pd.isnull(df.loc[('F')]['col_1'])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26.1
numpy: 1.12.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions