Skip to content

groupby().apply() returns different result depends on the first result is None or not. #12824

Closed
@ruoyu0088

Description

@ruoyu0088

The apply document says that it can:

apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to apply. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.

So, I think it may discard the group if the callback function returns None. Here is two exmaple that works and not works:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A":np.arange(10), "B":[1, 1, 1, 2, 3, 3, 3, 4, 4, 4]})
print(df.groupby("B").apply(lambda df2:None if df2.shape[0] <= 2 else df2.iloc[[0, -1]]))

df = pd.DataFrame({"A":np.arange(10), "B":[1, 2, 2, 2, 3, 3, 3, 4, 4, 4]})
print(df.groupby("B").apply(lambda x:None if x.shape[0] <= 2 else x.iloc[[0, -1]]))

the output is as following, the first one returns a DataFrame, the second one returns a Series with DataFrames inside:

     A  B
B        
1 0  0  1
  2  2  1
3 4  4  3
  6  6  3
4 7  7  4
  9  9  4


B
1         A    B
1  NaN  NaN
3  NaN  NaN
2                   A  B
1  1  2
3  3  2
3                   A  B
4  4  3
6  6  3
4                   A  B
7  7  4
9  9  4
dtype: object

The problem is that _wrap_applied_output() in groupby.py checks the first element to determine the concat method:

    if isinstance(values[0], DataFrame):
        return self._concat_objects(keys, values,
                                    not_indexed_same=not_indexed_same)

I want to know, does groupby().apply() support discarding groups?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions