Skip to content

PERF: cythonize groupby-rank #15779

Closed
@jreback

Description

@jreback

This dispatches to each group individually. Better to have a combined group_rank to do this. It is a bit of code and ideally would share some with the actual rank algos.

In [7]: ngroups = 1000

In [8]: N = 100000

In [9]: np.random.seed(1234)

In [10]: df = DataFrame({'key': np.random.randint(0, ngroups, size=N), 'value': np.arange(N)})

In [11]: %timeit df.groupby('key').rank()
1 loop, best of 3: 392 ms per loop

# comparision with group_shift_indexer, a transforming operator
In [13]: %timeit df.groupby('key').shift()
100 loops, best of 3: 3.15 ms per loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions