Skip to content

Three or more unnamed fields block loc assignment #13017

Closed
@JakeCowton

Description

@JakeCowton

Writing a dataframe to csv using df.to_csv("/path/to/file.csv") causes the creation of an "unnamed" field containing the indexes of the rows. Continually writing/reading to/from this file will result in many "unnamed" fields. Once there are three unnamed fields you can no longer use loc to replace values, what's more, it fails silently.

In [2]: df = DataFrame({'A': [1,2,3],'B': [4,5,501]})

In [3]: df
Out[3]:
   A    B
0  1    4
1  2    5
2  3  501

In [4]: df.B.loc[df.B > 500]
Out[4]:
2    501
Name: B, dtype: int64

In [5]: df.B.loc[df.B > 500] = None

In [6]: df
Out[6]:
   A    B
0  1  4.0
1  2  5.0
2  3  NaN

So far so good, I was able to replace all values in df.B with NaN. I then write this out and read it back in


In [7]: df.to_csv("./test.csv")

In [8]: df = read_csv("./test.csv")

In [9]: df.columns
Out[9]: Index([u'Unnamed: 0', u'A', u'B'], dtype='object')

In [10]: df
Out[10]:
   Unnamed: 0  A    B
0           0  1  4.0
1           1  2  5.0
2           2  3  NaN

As you can see, this has create an unnamed field, but let's continue

In [14]: df.B.fillna(501, inplace=True)

This is jsut to get 501 in place of the NaN I created earlier which I forgot to do before writing out

In [15]: df.B
Out[15]:
0      4.0
1      5.0
2    501.0
Name: B, dtype: float64

In [16]: df.B.loc[df.B > 500]
Out[16]:
2    501.0
Name: B, dtype: float64

In [17]: df.B.loc[df.B > 500] = None

...SettingWithCopyWarning...

In [18]: df.B
Out[18]:
0    4.0
1    5.0
2    NaN
Name: B, dtype: float64

Everything working fine


In [19]: df.fillna(501, inplace=True)

In [20]: df.to_csv("./test.csv")

In [21]: df = read_csv("./test.csv")

In [22]: df.columns
Out[22]: Index([u'Unnamed: 0', u'Unnamed: 0.1', u'A', u'B'], dtype='object')

In [23]: df
Out[23]:
   Unnamed: 0  Unnamed: 0.1  A      B
0           0             0  1    4.0
1           1             1  2    5.0
2           2             2  3  501.0

Writing and reading again creates a 2nd unnamed field


In [24]: df.B.loc[df.B > 500]
Out[24]:
2    501.0
Name: B, dtype: float64

In [25]: df.B.loc[df.B > 500] = None

In [26]: df.B
Out[26]:
0    4.0
1    5.0
2    NaN
Name: B, dtype: float64

Which is no problem, everything still works so far...however


In [27]: df.fillna(501, inplace=True)

In [28]: df
Out[28]:
   Unnamed: 0  Unnamed: 0.1  A      B
0           0             0  1    4.0
1           1             1  2    5.0
2           2             2  3  501.0

In [29]: df.to_csv("./test.csv")

In [30]: df = read_csv("./test.csv")

In [31]: df.columns
Out[31]: Index([u'Unnamed: 0', u'Unnamed: 0.1', u'Unnamed: 0.1', u'A', u'B'], dtype='object')

In [32]: df
Out[32]:
   Unnamed: 0  Unnamed: 0.1  Unnamed: 0.1  A      B
0           0             0             0  1    4.0
1           1             1             1  2    5.0
2           2             2             2  3  501.0

We now have 3 unnamed fields


In [33]: df.B.loc[df.B > 500]
Out[33]:
2    501.0
Name: B, dtype: float64

In [34]: df.B.loc[df.B > 500] = None

In [35]: df.B
Out[35]:
0      4.0
1      5.0
2    501.0
Name: B, dtype: float64

The method of replacing all values over 500 with Nan no longer works but also throws no errors or warnings.

You CAN get around this using df.loc[df.B > 500, 'B'] = None but obviously you shouldn't have to.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions