API: should apply also follow result_type for axis=0 ?

Follow-up issue on https://github.com/pandas-dev/pandas/pull/18577

In that PR @jreback cleaned up the `apply(..., axis=1)` result shape inconsistencies, and we added a keyword to control this.

For example, when the applied function returns an array or a list, it now defaults to returning a Series of those objects, or expanding it to multiple columns if you pass `result_type` explicitly:

```
In [1]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'], index=pd.date_range("2012-01-01", periods=4))

In [2]: df
Out[2]: 
            A  B  C
2012-01-01  1  2  3
2012-01-02  1  2  3
2012-01-03  1  2  3
2012-01-04  1  2  3

In [3]: df.apply(lambda x: np.array([0, 1, 2]), axis=1)
Out[3]: 
2012-01-01    [0, 1, 2]
2012-01-02    [0, 1, 2]
2012-01-03    [0, 1, 2]
2012-01-04    [0, 1, 2]
Freq: D, dtype: object

In [4]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='expand')
Out[4]: 
            0  1  2
2012-01-01  0  1  2
2012-01-02  0  1  2
2012-01-03  0  1  2
2012-01-04  0  1  2

In [5]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='broadcast')
Out[5]: 
            A  B  C
2012-01-01  0  1  2
2012-01-02  0  1  2
2012-01-03  0  1  2
2012-01-04  0  1  2
```

However, for `axis=0`, the default, we don't yet follow the same rules / the keyword in all cases. Some examples:
    
*  For list, it depends on the length (and if the length matches, it preserves the original index instead of new range index):
    ```
    In [16]: df.apply(lambda x: [0, 1, 2, 3])
    Out[16]: 
                A  B  C
    2012-01-01  0  0  0
    2012-01-02  1  1  1
    2012-01-03  2  2  2
    2012-01-04  3  3  3

    In [17]: df.apply(lambda x: [0, 1, 2, 3, 4])
    Out[17]: 
    A    [0, 1, 2, 3, 4]
    B    [0, 1, 2, 3, 4]
    C    [0, 1, 2, 3, 4]
    dtype: object
    ```

    (`result_type='expand'` and `result_type='broadcast'` do work correctly here)

*   For an array, it expands when the length does not match (so different as for `axis=1`, and also different as for list):

    ```
    In [23]: df.apply(lambda x: np.array([0, 1, 2, 3]))
    Out[23]: 
                A  B  C
    2012-01-01  0  0  0
    2012-01-02  1  1  1
    2012-01-03  2  2  2
    2012-01-04  3  3  3

    In [24]: df.apply(lambda x: np.array([0, 1, 2, 3, 4]))
    Out[24]: 
       A  B  C
    0  0  0  0
    1  1  1  1
    2  2  2  2
    3  3  3  3
    4  4  4  4
    ```

So the question is: should we follow the same rules for `axis=0` as for `axis=1`? 
I would say: ideally yes. But doing so might break some behaviour (although it might be possible to do that with warnings).


     
     
     
 
 
 
 
 
 
 
 
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: should apply also follow result_type for axis=0 ? #19570

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: should apply also follow result_type for axis=0 ? #19570

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions