Skip to content

REGR: 1.3 behavior change with groupby, apply and side effect #41999

Closed
@aberres

Description

@aberres

Code Sample, a copy-pastable example

def apply_func(df):
    del df["MEMBER_ID"]

    return df.head(1)


df = pd.DataFrame.from_dict(
    {
        "MONTH": {0: "April", 1: "April", 2: "April"},
        "MEMBER_ID": {0: "Member A", 1: "Member A", 2: "Member B"},
        "ACTIVITY_CATEGORY": {
            0: "Activity 1",
            1: "Days off at homebase",
            2: "Activity 1",
        },
        "FTE": {0: 1.0, 1: 1.0, 2: 0.75},
        "FTE_OFF_DAYS": {0: 5, 1: 5, 2: 10},
    }
)

df.groupby(["MONTH", "FTE"], observed=True)[
    ["MEMBER_ID", "FTE_OFF_DAYS", "ACTIVITY_CATEGORY"]
].apply(apply_func)

Problem description

Deleting the row passed to apply_func in combination with the call to head causes an exception in 1.3rc1.

When apply_func is called the second time the MEMBER_ID column does not exist anymore.

Doing a df = df.copy() in apply_func fixes the problem.

Expected Output

While one could argue that the code is a bit bogus (and should be possibly rewritten), I wonder if this change happened on purpose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyNeeds DiscussionRequires discussion from core team before further actionRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions