Skip to content

groupby().first() docs should explain distinction between nth and first #27578

Open
@kyleabeauchamp

Description

@kyleabeauchamp

Problem description

The existing doc for groupby().first() (https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.core.groupby.GroupBy.first.html?highlight=first#pandas.core.groupby.GroupBy.first) does not describe the behavior with respect to missing data. In particular, it does not mention the fact that the behavior is broadcasting columnwise.

The docs read: "Compute first of group values...Computed first of values within each group." I think the correct description is "For each column, compute the first non-null entry, possibly aggregating values from across multiple rows." We might also want a simple example to explain the behavior.

Code Sample, a copy-pastable example if possible

import pandas as pd
x = pd.DataFrame(dict(A=[1, 1, 3], B=[None, 5, 6], C=[1, 2, 3]))
print(x.groupby("A", as_index=False).first())
print(x.groupby("A", as_index=False).nth(0))
print(x.groupby("A", as_index=False).head(1))
[...]
   A    B  C
0  1  5.0  1
1  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions