ENH: add masked algorithm for mean()

Similarly as we now have masked implementations for sum, prod, min and max for the nullable integer array (first PR https://github.com/pandas-dev/pandas/pull/30982, now lives at https://github.com/pandas-dev/pandas/blob/master/pandas/core/array_algos/masked_reductions.py), we can add one for the `mean` reduction as well.

Very rough check gives a nice speed-up:

```
In [27]: arr = pd.array(np.random.randint(0, 1000, 1_000_000), dtype="Int64") 

In [28]: arr[np.random.randint(0, 1_000_000, 1000)] = pd.NA 

In [30]: arr._reduce("mean") 
Out[30]: 499.27095868772903

In [31]: %timeit arr._reduce("mean") 
7.26 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [32]: arr._data.sum(where=~arr._mask, dtype="float64") / (~arr._mask).sum() 
Out[32]: 499.27095868772903

In [33]: %timeit arr._data.sum(where=~arr._mask, dtype="float64") / (~arr._mask).sum()  
2.08 ms ± 6.89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

The `nanmean` version lives here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py#L517 
And as reference, numpy is also adding a version that accepts a mask: https://github.com/numpy/numpy/pull/15852 (which could be used in the future, and as inspiration for the implementation now).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add masked algorithm for mean() #34754

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: add masked algorithm for mean() #34754

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions