Skip to content

API: should apply(..., result_type='reduce') be honored for Series return value #19571

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Follow-up issue on #18577

In that PR we added a result_type='reduce' argument (partly as replacement for the deprecated reduce keyword).

The 'reduce' behaviour is the default for cases where the function returns a scalar, list, array, dict, .. (I think basically: everything that is not a Series). And in those cases you can then use result_type='broadcast'|'expand' to have other results:

In [32]: df.apply(lambda x: [0, 1, 2], axis=1)
Out[32]: 
0    [0, 1, 2]
1    [0, 1, 2]
2    [0, 1, 2]
3    [0, 1, 2]
dtype: object

In [33]: df.apply(lambda x: [0, 1, 2], axis=1, result_type='reduce')
Out[33]: 
0    [0, 1, 2]
1    [0, 1, 2]
2    [0, 1, 2]
3    [0, 1, 2]
dtype: object

In [34]: df.apply(lambda x: [0, 1, 2], axis=1, result_type='expand')
Out[34]: 
   0  1  2
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

But, for Series, we do not honour that argument when it is passed explicitly:

In [36]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'])

In [37]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1)
Out[37]: 
   0  1  2
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [38]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='expand')above
Out[38]: 
   0  1  2    # <--- default, so same as output above
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [39]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='broadcast')
Out[39]: 
   A  B  C    # <--- with broadcast we preserve original index
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

In [40]: df.apply(lambda x: pd.Series([0, 1, 2]), axis=1, result_type='reduce')
Out[40]: 
   0  1  2    # <--- should this be a Series of Series objects ?
0  0  1  2
1  0  1  2
2  0  1  2
3  0  1  2

So should we follow the result_type='reduce' here and return a Series of Series objects?

I know a Series of Series objects is completely useless (but is it that more useless than Series of lists, or Series of arrays? probably yes, but is that worth the inconsistency?).
I think it would be better to either return it as a Series anyhow, or raise an error that we cannot reduce that. IMO this will be more useful in case somebody tries to do this, as it will educate the user about what result_type='reduce' is actually meant for, or it can signal that your function is doing something different than you expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions