Skip to content

API: IntervalIndex.get_indexer not strict about passed target values dtype #47772

@jorisvandenbossche

Description

@jorisvandenbossche

Consider the following example of IntervalIndex with datetime64 subdtype:

In [40]: iidx = pd.IntervalIndex.from_breaks(pd.date_range("2018-01-01", periods=4))

In [41]: iidx
Out[41]: IntervalIndex([(2018-01-01, 2018-01-02], (2018-01-02, 2018-01-03], (2018-01-03, 2018-01-04]], dtype='interval[datetime64[ns], right]')

In [42]: iidx.get_indexer([pd.Timestamp("2018-01-02")])
Out[42]: array([0])

In [43]: iidx.get_indexer(["2018-01-02"])
Out[43]: array([0])

In [44]: iidx.get_indexer([pd.Timestamp("2018-01-02").value])
Out[44]: array([0])

(the above is with pandas 1.3.5, on 1.4 / main, the first two still work, but the last one not anymore)

Being able to index with strings (in addition to Timestamp / datetime64 values) is probably expected? (since that also seems to work like that for DatetimeIndex)
But we shouldn't accept integer values, I think? (this could also be deprecated first, since it also impacts behaviour of .loc indexing)

This last case was changed (unintentionally I think, given there were no tests) in #47771, and I am changing this back in #47771 to fix a cut regression (and implicitly also restoring the get_indexer behaviour).

If we want to remove this again (or deprecate first), we have to change the logic inside cut a bit to ensure we pass correctly dtyped values to IntervalIndex.get_indexer (see explanation in top post of #47771 for context)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasIndexingRelated to indexing on series/frames, not to indexes themselvesIntervalInterval data typecutcut, qcut

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions