Description
Currently pandas implements various groupby ops via _make_wrapper
. There are "allow lists" in pandas.core.groupby.base
for both SeriesGroupBy and DataFrameGroupBy, and the class decorator pandas.core.groupby.generic.pin_allowlisted_properties
adds these as properties to the SeriesGroupBy and DataFrameGroupBy classes.
Instead, should we convert _make_wrapper
into a normal method (e.g. _op_via_apply
) and add methods for each op in the allow lists. An example of such an added method would be:
@doc(DataFrame.skew.__doc__)
def skew(
self,
axis=lib.no_default,
skipna=True,
level=None,
numeric_only=None,
**kwargs
):
result = self._op_via_apply(
'skew',
axis=axis,
skipna=skipna,
level=level,
numeric_only=numeric_only,
**kwargs
)
return result
The advantage of the current approach is less boilerplate code (the method definitions) and a consistent API between e.g. DataFrame and DataFrameGroupBy ops. But the disadvantage is that the docs come out as properties (e.g. skew) and anyone using Python's dynamic abilities gets incorrect results:
df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a')
print(type(DataFrameGroupBy.sum))
print(type(DataFrameGroupBy.skew))
print(type(gb.sum))
print(type(gb.skew))
print(inspect.signature(df.groupby('a').sum))
print(inspect.signature(df.groupby('a').skew))
# <class 'function'>
# <class 'property'>
# <class 'method'>
# <class 'function'>
# (numeric_only: 'bool | lib.NoDefault' = <no_default>, min_count: 'int' = 0, engine: 'str | None' = None, engine_kwargs: 'dict[str, bool] | None' = None)
# (*args, **kwargs)
I also find the current implementation harder to understand / debug, but that is very much an opinion.
Finally, tests can be added to ensure the consistency of arguments between e.g. DataFrame.skew
and DataFrameGroupBy.skew
using Python's inspect
module.
cc @jreback @jbrockmendel @mroeschke for any thoughts.