Description
Follow-up issue on #18577
In that PR @jreback cleaned up the apply(..., axis=1)
result shape inconsistencies, and we added a keyword to control this.
For example, when the applied function returns an array or a list, it now defaults to returning a Series of those objects, or expanding it to multiple columns if you pass result_type
explicitly:
In [1]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'], index=pd.date_range("2012-01-01", periods=4))
In [2]: df
Out[2]:
A B C
2012-01-01 1 2 3
2012-01-02 1 2 3
2012-01-03 1 2 3
2012-01-04 1 2 3
In [3]: df.apply(lambda x: np.array([0, 1, 2]), axis=1)
Out[3]:
2012-01-01 [0, 1, 2]
2012-01-02 [0, 1, 2]
2012-01-03 [0, 1, 2]
2012-01-04 [0, 1, 2]
Freq: D, dtype: object
In [4]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='expand')
Out[4]:
0 1 2
2012-01-01 0 1 2
2012-01-02 0 1 2
2012-01-03 0 1 2
2012-01-04 0 1 2
In [5]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='broadcast')
Out[5]:
A B C
2012-01-01 0 1 2
2012-01-02 0 1 2
2012-01-03 0 1 2
2012-01-04 0 1 2
However, for axis=0
, the default, we don't yet follow the same rules / the keyword in all cases. Some examples:
-
For list, it depends on the length (and if the length matches, it preserves the original index instead of new range index):
In [16]: df.apply(lambda x: [0, 1, 2, 3]) Out[16]: A B C 2012-01-01 0 0 0 2012-01-02 1 1 1 2012-01-03 2 2 2 2012-01-04 3 3 3 In [17]: df.apply(lambda x: [0, 1, 2, 3, 4]) Out[17]: A [0, 1, 2, 3, 4] B [0, 1, 2, 3, 4] C [0, 1, 2, 3, 4] dtype: object
(
result_type='expand'
andresult_type='broadcast'
do work correctly here) -
For an array, it expands when the length does not match (so different as for
axis=1
, and also different as for list):In [23]: df.apply(lambda x: np.array([0, 1, 2, 3])) Out[23]: A B C 2012-01-01 0 0 0 2012-01-02 1 1 1 2012-01-03 2 2 2 2012-01-04 3 3 3 In [24]: df.apply(lambda x: np.array([0, 1, 2, 3, 4])) Out[24]: A B C 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
So the question is: should we follow the same rules for axis=0
as for axis=1
?
I would say: ideally yes. But doing so might break some behaviour (although it might be possible to do that with warnings).