
Description
There have been multiple issues regarding
- Confusion about relative respective roles of
agg
andtransform
(andapply
).
Why do I get strange aggregation result from DataFrame groupby()? #26960, Behavior of new df.agg, df.transform and df.apply is very inconsistent #18103 in both the DataFrame and Grou transform('rank')
and others returning the wrong answer (first issue from 2016): bug when filling missing values with transform? #14274, Shortcut functions in transform are not grouped #19354, Wrong output of GroupBy transform with string input (e.g., transform('rank')) #22509,
I'd like to prepare a PR to fix this, but I need to know what the consensus is first.
1. should Groupby/DataFrame/Series.agg
disallow transformations?
#14741 (comment) about Groupby.agg
you need to use .transform. .agg is by definition a reducer.
agg
currently accepts transformations as well.
DataFrame.agg
was merged in #27389 despite repeated objections to mixing transformations and aggregations, #14668 (comment) and #14668 (comment).
2. What should transform('rank')
return?
updated
g.min # one value per group. ok.
g.transform('min') # one value per group, broadcasted. ok.
g.rank() # like-index result. Ok
g.transform('rank') # Currently, a nonsense result.
On the one hand, users have been told that transformations don't belong in transform
(!): #22509 (comment), #14274 (comment). This makes sense if you think of the transform('name')
form as solely for broadcasting aggregations.
On the other hand, the documentation for transform
, as well as Wes Mckinney's excellent pandas book portray transform
as the dedicated tool for shape-preserving operations, so excluding them from the transform('name')
case would be a little surprising.
Personally, I'm +0 for deprecating transform('rank')
and with a warning to use g.rank()
, as well as for the other transformation ops.