Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-view-versus-copy
Documentation problem
import pandas as pd
# Creating a DataFrame with some sample data
data = {
'Name': ['Jason', 'Emma', 'Alex', 'Sarah'],
'Age': [28, 24, 32, 27],
'City': ['New York', 'London', 'Paris', 'Tokyo'],
'Salary': [75000, 65000, 85000, 70000]
}
df = pd.DataFrame(data)
# Display the DataFrame
print(df)
I want to update Jason’s age, and I do so with
df['Age'][df['Name'] == 'Jason'] = 29
For code such as the code shown above, the df may or may not be update Jason's age to 29 due to the chained indexing that is being used.
The documentation mentions how .iloc/.loc is a better option. For example, something such as the following.
df.loc[df['Name'] == 'Jason', 'Age'] = 29
However it is not clear about best practices regarding tuples, such as the following.
df[('Age', df['Name'] == 'Jason')] = 29
Suggested fix for documentation
The suggested fix is to explain how the use of tuples would compare to the use of .iloc/.loc and the use of chained indexing in the context of best practices in pandas. Considerations can include time complexity, space complexity, code readability, etc.