Open
Description
Is your feature request related to a problem?
From what I've seen in python classes: Handling a huge data frame, especially beginners using apply
on huge data frames don't know that it loads the whole frame while looping.
Example:
Consider following function and data frame
import pandas as pd
def complex_function(x, y=0):
if x > 5 and x > y:
return 1
else:
return 2
df = pd.DataFrame(data={'col1': [1, 4, 6, 2, 7], 'col2': [6, 7, 1, 2, 8]})
For a greater data frame, consider performance differences with
df['col1'] = df['col1'].apply(complex_function)
and the much less efficient
df['col1'] = df.apply(function(x) {complex_function(x['col1'])})
Describe the solution you'd like
Just a small warning that, if only one column is accessed within apply body. Something like:
Warning: Only one column seems to be accessed, causing performance issues
Describe alternatives you've considered
We just leave it as it is, no great differences for most users