Description
Code Sample, a copy-pastable example if possible
df = pd.DataFrame(
[
['A', np.nan, 1.23, 4.56],
['A', 'G', 1.23, 4.56],
['A', 'D', 9.87, 10.54],
],
columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
if necessary_value not in df.index.get_level_values('pivot_1').tolist():
print("Missing", necessary_value)
df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Fails: value of 1.23 from the first row in the df is copied. As of v0.22.0 this was np.nan
Problem description
When using the MultiIndex features of pandas, when an np.nan
is in the index when new values are added to the DF then the values are not np.nan
, but copied from the np.nan
row.
This behavior shows for all versions v0.23.x, however is fine for 0.22.0.
Expected Output
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Pass, works in v0.22.0
Note this unexpected behavior does not show when the np.nan
is not included in the index, nor for a single Index.
MultiIndex without np.nan
df = pd.DataFrame(
[
#['A', np.nan, 1.23, 4.56], # Comment out the np.nan
['A', 'G', 1.23, 4.56],
['A', 'D', 9.87, 10.54],
],
columns=['pivot_0', 'pivot_1', 'col_1', 'col_2'],
)
df.set_index(['pivot_0', 'pivot_1'], inplace=True)
pivot_0 = 'A'
necessary_pivot_1_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_1_values:
if necessary_value not in df.index.get_level_values('pivot_1').tolist():
print "Missing", necessary_value
df.at[(pivot_0, necessary_value), 'col_2'] = 0.0
assert df.loc[('A', 'F')]['col_2'] == 0.0 # Pass
assert pd.isnull(df.loc[('A', 'F')]['col_1']) # Pass
Single Index with np.nan
df = pd.DataFrame(
[
[np.nan, 1.23, 4.56],
['G', 1.23, 4.56],
['D', 9.87, 10.54],
],
columns=['pivot_0', 'col_1', 'col_2'],
)
df.set_index(['pivot_0'], inplace=True)
necessary_pivot_0_values = ['D', 'E', 'F' ]
for necessary_value in necessary_pivot_0_values:
if necessary_value not in df.index.get_level_values('pivot_0').tolist():
print "Missing", necessary_value
df.at[(necessary_value), 'col_2'] = 0.0
assert df.loc[('F')]['col_2'] == 0.0
assert pd.isnull(df.loc[('F')]['col_1'])
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.26.1
numpy: 1.12.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None