PERF: performance regression in slicing a Series

All of the slice indexing benchmarks are showing regressions, eg https://pandas.pydata.org/speed/pandas/#indexing.NumericSeriesIndexing.time_loc_slice?p-index_dtype=%3Cclass%20'pandas.core.indexes.numeric.Int64Index'%3E&p-index_structure='unique_monotonic_inc'&commits=80d37adc-4f89c261

Replicating one of them with a small snippet confirms this:

```
In [1]: pd.__version__   
Out[1]: '1.1.0.dev0+1122.gc7c640ec7'

In [2]: N = 10 ** 6  

In [3]: data = pd.Series(np.random.rand(N), index=pd.Int64Index(range(N))) 

In [4]: %timeit data.iloc[:800000] 
103 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit data[:800000] 
97.2 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

vs

```
In [1]: pd.__version__
Out[1]: '1.0.3'

In [2]: N = 10 ** 6  

In [3]: data = pd.Series(np.random.rand(N), index=pd.Int64Index(range(N))) 

In [4]: %timeit data.iloc[:800000]  
55.7 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit data[:800000]   
69.6 µs ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

Checking with snakeviz shows that on master, quasi all time the indexing operation takes is spent in setting the `mgr_locs` (creating the BlockPlacement object), which is certainly not as expected:

![image](https://user-images.githubusercontent.com/1020496/78568022-cefa8580-7821-11ea-9fc8-6f0e55e8d4d0.png)

Putting a breakpoint in the Block init just before creating the BlockPlacement, showed that the value being passed on master is a `range` object, while this should be a `slice` object, I assume. 
Going up the stack to see where this range object is created, points to:

https://github.com/pandas-dev/pandas/blob/a9c105a7a6dfa210f2706e2d8df6a6222964ff26/pandas/core/internals/managers.py#L1571

which according to `git blame` is last touched by this PR: https://github.com/pandas-dev/pandas/pull/32421. And indeed, a range object was added there instead of a slice (other places in the PR *did* use a slice, so this was probably kind of a typo).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: performance regression in slicing a Series #33323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: performance regression in slicing a Series #33323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions