Skip to content

PERF: concat of same categoricals can be sped up #10587

Closed
@jreback

Description

@jreback

These are currently converted to object (and back) in _concat_compat. But if they are the same dtype (same categoriies/ordering), then this can be short-circutied.

In [17]: s = Series(list('aabbcd')*1000000).astype('category')

In [18]: result = pd.concat([s,s])

In [19]: result2 = Series(pd.Categorical(pd.concat([s.cat.codes,s.cat.codes]),s.cat.categories,fastpath=True))

In [20]: result = pd.concat([s,s],ignore_index=True)

In [21]: result2 = Series(pd.Categorical(pd.concat([s.cat.codes,s.cat.codes],ignore_index=True),s.cat.categories,fastpath=True))

In [22]: result.equals(result2)
Out[22]: True

In [23]: %timeit pd.concat([s,s],ignore_index=True)
1 loops, best of 3: 658 ms per loop

In [24]: %timeit Series(pd.Categorical(pd.concat([s.cat.codes,s.cat.codes],ignore_index=True),s.cat.categories,fastpath=True))
10 loops, best of 3: 52.3 ms per loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions