Skip to content

ENH: enumerate groups #11642

Closed
Closed
@dsm054

Description

@dsm054

Sometimes it's handy to have access to a distinct integer for each group. For example, using the (internal) grouper:

>>> df = pd.DataFrame({"a": list("xyyzxy"), "b": list("ab"*3), "c": range(6)})
>>> df["group_id"] = df.groupby(["a","b"]).grouper.group_info[0]
>>> df
   a  b  c  group_id
0  x  a  0         0
1  y  b  1         2
2  y  a  2         1
3  z  b  3         3
4  x  a  4         0
5  y  b  5         2

This can be achieved in a number of ways but none of them are particularly elegant, esp. if we're grouping on multiple keys and/or Series. Accordingly, after a brief discussion on gitter, I propose a new method transform("enumerate") which returns a Series of integers from 0 to ngroups-1 matching the order the groups will be iterated in. In other words, we'll simply be applying the following map:

>>> m = {k: i for i, (k,g) in enumerate(df.groupby(["a","b"]))}
>>> m
{('x', 'a'): 0, ('y', 'b'): 2, ('y', 'a'): 1, ('z', 'b'): 3}

(Note this is only to shows the desired behaviour, and wouldn't be how it'd be implemented!)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions