Closed
Description
Code Sample, a copy-pastable example if possible
>>> dates = pd.date_range('2011-1-1', periods=500000, freq='min')
>>> index = np.random.choice(dates, 500000, replace=True)
>>> df = pd.DataFrame(index=index, data={'a':1})
>>> df_sort = df.sort_index()
>>> %timeit df['2011-6-11']
1.19 ms ± 77.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> # Sorted is three times faster
>>> %timeit df_sort['2011-6-11']
333 µs ± 17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> # Now with .loc
>>> %timeit df.loc['2011-6-11']
2.59 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> # 2500x slower
>>> %timeit df_sort.loc['2011-6-11']
853 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> # Slicing works fine
>>> %timeit df.loc['2011-6-11':'2011-10-1']
52 ms ± 2.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df_sort.loc['2011-6-11':'2011-10-1']
658 µs ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Problem description
When using the .loc
indexer on a large frame with sorted datetimeindex, selection is ~2500 times slower than just the indexing operator itself. It's also ~300 times slower than the unsorted .loc lookup.
Slicing appears to work as expected
Expected Output
Sorted frame should be faster when using .loc