Skip to content

Discuss: transformation vs. aggregation in agg vs. transform #27389

Closed
@ghost

Description

There have been multiple issues regarding

I'd like to prepare a PR to fix this, but I need to know what the consensus is first.

1. should Groupby/DataFrame/Series.agg disallow transformations?

#14741 (comment) about Groupby.agg

you need to use .transform. .agg is by definition a reducer.

agg currently accepts transformations as well.

DataFrame.agg was merged in #27389 despite repeated objections to mixing transformations and aggregations, #14668 (comment) and #14668 (comment).

2. What should transform('rank') return?

updated

g.min               # one value per group. ok.
g.transform('min')  # one value per group, broadcasted. ok.
g.rank()            # like-index result. Ok
g.transform('rank') # Currently, a nonsense result.

On the one hand, users have been told that transformations don't belong in transform (!): #22509 (comment), #14274 (comment). This makes sense if you think of the transform('name') form as solely for broadcasting aggregations.

On the other hand, the documentation for transform, as well as Wes Mckinney's excellent pandas book portray transform as the dedicated tool for shape-preserving operations, so excluding them from the transform('name') case would be a little surprising.

Personally, I'm +0 for deprecating transform('rank') and with a warning to use g.rank(), as well as for the other transformation ops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions