Skip to content

CLN: Refactor groupby's _make_wrapper #48028

Closed
@rhshadrach

Description

@rhshadrach

Currently pandas implements various groupby ops via _make_wrapper. There are "allow lists" in pandas.core.groupby.base for both SeriesGroupBy and DataFrameGroupBy, and the class decorator pandas.core.groupby.generic.pin_allowlisted_properties adds these as properties to the SeriesGroupBy and DataFrameGroupBy classes.

Instead, should we convert _make_wrapper into a normal method (e.g. _op_via_apply) and add methods for each op in the allow lists. An example of such an added method would be:

    @doc(DataFrame.skew.__doc__)
    def skew(
        self,
        axis=lib.no_default,
        skipna=True,
        level=None,
        numeric_only=None,
        **kwargs
    ):
        result = self._op_via_apply(
            'skew',
            axis=axis,
            skipna=skipna,
            level=level,
            numeric_only=numeric_only,
            **kwargs
        )
        return result

The advantage of the current approach is less boilerplate code (the method definitions) and a consistent API between e.g. DataFrame and DataFrameGroupBy ops. But the disadvantage is that the docs come out as properties (e.g. skew) and anyone using Python's dynamic abilities gets incorrect results:

df = pd.DataFrame({'a': [1, 1, 2], 'b': [3, 4, 5]})
gb = df.groupby('a')

print(type(DataFrameGroupBy.sum))
print(type(DataFrameGroupBy.skew))
print(type(gb.sum))
print(type(gb.skew))
print(inspect.signature(df.groupby('a').sum))
print(inspect.signature(df.groupby('a').skew))

# <class 'function'>
# <class 'property'>
# <class 'method'>
# <class 'function'>
# (numeric_only: 'bool | lib.NoDefault' = <no_default>, min_count: 'int' = 0, engine: 'str | None' = None, engine_kwargs: 'dict[str, bool] | None' = None)
# (*args, **kwargs)

I also find the current implementation harder to understand / debug, but that is very much an opinion.

Finally, tests can be added to ensure the consistency of arguments between e.g. DataFrame.skew and DataFrameGroupBy.skew using Python's inspect module.

cc @jreback @jbrockmendel @mroeschke for any thoughts.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions