Skip to content

Float64Index is very slow in some condition. #13166

Closed
@ruoyu0088

Description

@ruoyu0088

The following code is very slow:

import pandas as pd
import numpy as np

dt = 4.8000000418824129e-08
data = np.random.rand(1000000, 4)
df = pd.DataFrame(data, columns=list("ABCD"))
df.index *= dt
print(df.loc[0:0.001].shape)

after debug it, I found Float64Engine.get_loc() is slow. Here is a demo:

import pandas as pd
import numpy as np

a = np.arange(1000000)
ind1 = pd.Float64Index(a * 4.8e-08)
ind2 = pd.Float64Index(a * 4.8000000418824129e-08)

%time ind1._engine.get_loc(0)
%time ind2._engine.get_loc(0)

outputs:

Wall time: 295 ms
Wall time: 9.9 s

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions