Skip to content

ENH: apply np.ufunc.accumulate along the columns/blocks (to preserve dtypes) #39275

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Follow-up on #39260 (comment)

Currently, an "accumulate" ufunc is applied on the full DataFrame at once, with the consequence that it doesn't preserve dtypes if you have mixed numeric columns, eg:

In [4]: df = pd.DataFrame({"a": [1, 3, 2, 4], "b": [0.1, 4.0, 3.0, 2.0]})

In [5]: df
Out[5]: 
   a    b
0  1  0.1
1  3  4.0
2  2  3.0
3  4  2.0

In [6]: np.maximum.accumulate(df)
Out[6]: 
     a    b
0  1.0  0.1
1  3.0  4.0
2  3.0  4.0
3  4.0  4.0

It is certainly possible for the default case (corresponding to .accumulate(axis=0)) to apply this ufunc on each column or block, to preserve the column dtypes. When axis=1 is passed to the ufunc this is not possible.

See at the linked PR discussion above for some more details at what is involved to implement this.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions