Skip to content

ENH: Warning with only one column used in apply causing performance issues #44490

Open
@crosspolar

Description

@crosspolar

Is your feature request related to a problem?

From what I've seen in python classes: Handling a huge data frame, especially beginners using apply on huge data frames don't know that it loads the whole frame while looping.

Example:
Consider following function and data frame

  import pandas as pd

  def complex_function(x, y=0):
      if x > 5 and x > y:
          return 1
      else:
          return 2

  df = pd.DataFrame(data={'col1': [1, 4, 6, 2, 7], 'col2': [6, 7, 1, 2, 8]})

For a greater data frame, consider performance differences with

df['col1'] = df['col1'].apply(complex_function)

and the much less efficient

df['col1'] = df.apply(function(x) {complex_function(x['col1'])})

Describe the solution you'd like

Just a small warning that, if only one column is accessed within apply body. Something like:

Warning: Only one column seems to be accessed, causing performance issues

Describe alternatives you've considered

We just leave it as it is, no great differences for most users

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapEnhancementNeeds DiscussionRequires discussion from core team before further actionPerformanceMemory or execution speed performanceWarningsWarnings that appear or should be added to pandas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions