Skip to content

PERF: Period factorization very slow in 0.19.0 #14338

Closed
@bmoscon

Description

@bmoscon
df = DataFrame(data={'data': np.random.randint(0, 100, size=5500000),
                         'date': [dt(2016, 1, 1)] * 5500000})
for period, g in df.groupby(pd.DatetimeIndex(df.date).to_period('D')):
    print(g)

Expected Output

outputs dataframe

Output of pd.show_versions()

0.19.0

The output is not the issue, the issue is that in any version before 0.19.0, this was incredibly fast, like ~1 second or less. With 0.19.0, after waiting many minutes I just give up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performancePeriodPeriod data typeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions