Skip to content

Asymmetry in corner case for DataFrame __getattr__ and __setattr__ #8994

Closed
@jakevdp

Description

@jakevdp

A student of mine ran into a confusing problem which ended up being due to an asymmetry in DataFrame.__setattr__ and DataFrame.__getattr__ when an attribute and a column have the same name. Here is a short example session:

import pandas as pd
print(pd.__version__)  # '0.14.1'
data = pd.DataFrame({'x':[1, 2, 3]})

# try to create a new column, making a common mistake
data.y = 2 * data.x

# oops! That didn't create a column.
# we need to do it this way instead
data['y'] = 2 * data.x

# update the attribute, and it updates the column, not the attribute
data.y = 0
print(data['y'])  # [0, 0, 0]

# print the attribute, and it prints the attribute, not the column
print(data.y)  # [2, 4, 6]
print(data['y']) # [0, 0, 0]

The confusion derived from the fact that in this situation, data.__getattr__('y') refers to the attribute, while data.__setattr__('y', val) refers to the column.

I understand that using attributes to access columns is not recommended, but the asymmetry in this corner case led to a lot of confusion! It would be better if __getattr__ and __setattr__ would always refer to the same object.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions