Skip to content

DatetimeIndex selection with .loc is orders of magnitude slower than [] on ordered frame #17754

Closed
@tdpetrou

Description

@tdpetrou

Code Sample, a copy-pastable example if possible

>>> dates = pd.date_range('2011-1-1', periods=500000, freq='min')
>>> index = np.random.choice(dates, 500000, replace=True)
>>> df = pd.DataFrame(index=index, data={'a':1})
>>> df_sort = df.sort_index()

>>> %timeit df['2011-6-11']
1.19 ms ± 77.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> # Sorted is three times faster
>>> %timeit df_sort['2011-6-11']
333 µs ± 17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> # Now with .loc
>>> %timeit df.loc['2011-6-11']
2.59 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> # 2500x slower
>>> %timeit df_sort.loc['2011-6-11']
853 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> # Slicing works fine
>>> %timeit df.loc['2011-6-11':'2011-10-1']
52 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit df_sort.loc['2011-6-11':'2011-10-1']
658 µs ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Problem description

When using the .loc indexer on a large frame with sorted datetimeindex, selection is ~2500 times slower than just the indexing operator itself. It's also ~300 times slower than the unsorted .loc lookup.

Slicing appears to work as expected

Expected Output

Sorted frame should be faster when using .loc

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions