Closed
Description
Hello !
I discovered an unexpected behaviour while using pandas
Code Sample
import pandas as pd
def process(df):
df.index = df["A"]
df.loc['first', 'A'] = 'first'
print("Before")
print(df)
print()
df.loc['first', 'B'] = 99
print("After")
print(df)
print("First processing : without csv file :\n")
df = pd.DataFrame({'A': ["first", "second"], 'B': [3,4]})
process(df)
print("\n\nSecond processing : via a csv file :\n")
df = pd.DataFrame({'A': ["first", "second"], 'B': [3,4]})
df.to_csv("df.csv",index=False)
df = pd.read_csv("df.csv")
process(df)
Problem description
The 2 output that process(df)
produces are different if df
have been previously exported to a csv file or not.
Expected output
First processing : without csv file :
Before
A B
A
first first 3
second second 4
After
A B
A
first first 99
second second 4
Second processing : via a csv file :
Before
A B
A
first first 3
second second 4
After
A B
A
first first 99
second second 4
Actual output
First processing : without csv file :
Before
A B
A
first first 3
second second 4
After
A B
A
first first 99
second second 4
Second processing : via a csv file :
Before
A B
A
first first 3
second second 4
After
A B
A
first NaN 99.0
second second 4.0
first NaN 99.0
Mea culpa
I guess that writing df.index = df["A"]
is a bad practice because comumn A
is still linked to the index. So when we modify column A
, the index is changed as well. It could be good to raise a Warning to tell silly people like me that I shouldn't do that.
Thank you, and have a cute day !