Skip to content

REF: Reductions #53261

Open
Open
@jbrockmendel

Description

@jbrockmendel

We have reductions implemented in nanops, _libs.groupby, and _libs.window.aggregations. We should refactor these with the following goals in mind:

  1. Have one/fewer distinct implementations
  2. Avoid copies, particularly in the nanops versions where we do something like values[notna(values)]
  3. Chunked-friendliness, so that we can re-write ArrowExtensionArray._groupby_op to operate chunk-by-chunk, avoiding a copy in multi-chunk cases. (This could also be useful for hypothetical distributed EAs)
  4. Avoid casting/inference in nanops
  5. update Do axis=1 reductions without transposing/copying, inspired by PERF: axis=1 reductions with EA dtypes #54341

The implementation of group_skew is derived from https://www.johndcook.com/blog/skewness_kurtosis/ which includes a method for "adding" multiple RunningStats instances. Something like that could be adapted for 3).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions