Why does Series.transform() exist?

This is my first issue on GitHub, so apologies in advance if there's something wrong with the format.

My issue does not have any expected output, I just really want to understand if and why the `Series.transform() ` method is not redundant. Overall, the `transform()` methods are very similar to `apply()` methods, and as I was trying to figure out what the difference between them is ([this](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) Stack Overflow topic was helpful), I managed to pinpoint 3 primary differences:

1)	When the DataFrame is grouped on several categories, `apply()` sends the entire sub-DataFrames within the function, while `transform()` sends each column of each sub-DataFrame separately. That's why columns can't access values in other columns within `transform()`;
2)	When the input passed to the function is an iterable of a certain length, `apply()` can still have the output of any length, while `transform()` has a limitation of having to output an iterable of the same length as the input;
3)	When the function outputs a scalar, `apply()` returns that scalar, while `transform()` propagates that scalar to the iterable of the input length.

I conducted a series of experiments that test these three differences on each applicable pandas object type: Series, DataFrame, SeriesGroupBy, and DataFrameGroupBy. I can send my ipynb with the code and the results if necessary, but it would be sufficient to just look at the conclusion for the Series type:

1 – not applicable. In both cases the function has a scalar input.
2 – not applicable. No matter what the function returns, in both cases the result is assigned to the single cell, even if it means entire DataFrames within cells of a Seires.
3 – not applicable. The input length is always "1" (it's considered "1" even when it's an iterable), so there's no need to propagate.

Inapplicability of 1 is self-explanatory. But 2 was a surprise. Below is the code I tried:
```
import pandas as pd

df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 
                   'a':[4,5,1,3], 'b':[6,10,3,11]})

def return_df(x):
    return pd.DataFrame([[4, 5], [3, 2]])

def return_series(x):
    return pd.Series([1, 2])

df['a'].transform(return_df)
df['a'].transform(return_series)
```

If you try this code, you'll see that it doesn't matter what the function returns. Whatever it is, it will be put inside the single Series cell in its entirety. Is this behavior intentional? It results in the output size being predetermined by the input size, so all the size checks that `Series.transform()` has within itself become redundant. I can't imagine any situation where `Series.transform()` could behave in a different way from `Series.apply()`. And that raises the question I posed: why does `Series.transform()` exist?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Series.transform() exist? #31937

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why does Series.transform() exist? #31937

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions