Skip to content

DEPR: deprecate element-wise operations in (Series|DataFrame).transform #54906

Open
@topper-123

Description

@topper-123

A discussion has been going on in #54747 (PDEP 13) about making Series.transform and DataFrame.transform always operate on Series. See #54747 (comment) and related comments. Opening a separate issue to separate that discussion from PDEP 13/#54747.

Currently, Series.transform tries to operates on series element and if that fails it tries operating on the series. So it uses a fallback mechanism, which makes it difficult to use + the first choice (element-wise operations) is very slow. DataFrame.transform operates on series (i.e. columns/rows) when given callables, but operates on elements, when given lists or dicts of callables, which is inconsistent. Examples:

>>> df = pd.DataFrame({"x":range(100_000)})
>>> %timeit df["x"].transform(lambda x: x + 1) # operates on elements, slow
15.5 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['x'].transform(np.sin) # ufunc, fast
784 µs ± 19.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>> %timeit df['x'].transform(lambda x: np.sin(x)). # non-ufunc, operates on elements, slow
86.6 ms ± 1.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df.transform(lambda x: x + 1)  # operates on the columns/series, fast
142 µs ± 589 ns per loop
>>> %timeit df.transform([lambda x: x + 1])  # lists/dicts operate on the elements, slow
16.7 ms ± 165 µs per loop

All in all, the above is very inconsistent and difficult to reason about for users, similarly to the discussion regarding apply in #54747/PDEP 13.

I propose to deprecate element-wise operations in (Series|DataFrame).transform, so in Pandas v3.0 giving callables (and lists/dicts of callables) to (Series|DataFrame).transform always operates on series. The benefit of this is that the (Series|DataFrame).transform method will become much more predictable and faster. When users want to do element-wise operations, they should be directed to use (Series|DataFrame).map. So no functionality is lost, but we get clearer separation between series-wise and element-wise operations.

The deprecation is proposed implemented in pandas v2.2, where we add a new keyword parameter series_ops_only to (Series|DataFrame).transform. When set to true, (Series|DataFrame).transform will always operate on the whole series. When False, the old behavior will be kept, and a deprecation warning will be emitted. In pandas v3.0, the old behavior will be removed and (Series|DataFrame).transform will only operate on series.

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapDeprecateFunctionality to remove in pandasTransformationse.g. cumsum, diff, rank

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions