PERF: cache sorted data in GroupBy?

When we do a groupby transform/reduce that requires operating group-by-group, we construct a sorted (DataFrame|Series) so that we can iterate over it efficiently.  That construction is cached within a DataSplitter class, but the splitter itself is not cached.  IIUC we can get some mileage by caching the DataSplitter, at the possible cost of having a copy hang around longer than we might want.

Also we have a separate construct-a-sorted-object path in _numba_prep that might be able to re-use some code.

Final thought: we could check in DataSplitter.sorted_data whether _sort_idx is monotonic, in which case the (DataFrame|Series) is already sorted and we don't need to make a copy.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: cache sorted data in GroupBy? #51077

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PERF: cache sorted data in GroupBy? #51077

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions