Description
This one is a bit complex to explain, but I'll do my best.
Currently IntervalIndex.get_indexer
fails if the other index doesn't contain Interval
only (there's also another bug, but let's keep it simple here).
The underlying issue is that IntervalIndex.get_indexer
depends on IntervalIndex.get_loc
which is ambigous for how it treats number inputs:
>> ii = pd.IntervalIndex.from_breaks([0,1,2,3])
>> ii.get_loc(pd.Interval(1, 2))
1 # ok
>> ii.get_loc(1) # do we mean exactly 1, or if an interval contains the number 1?
1 # ambigous
The issue is that get_loc
returns the location for both exact matches and inexact matches (i.e. if the number input is in an interval). For the purposes of get_indexer
however, this behavious fails, as get_indexer
needs get_loc
to find exact matches only.
See #19021 (comment) for further discussion.
Solution
A solution could be adding a 'strict'
option to the method
parameter of IntervalIndex.get_loc
.
This wasn't so difficult after all, and I've already made a PR on this, see #19353