BUG: Index.get_indexer with mixed-reso datetime64s

```python
import numpy as np
import pandas as pd

ms = np.datetime64(1, "ms")
us = np.datetime64(1000, "us")

left = pd.Index([ms], dtype=object)
right = pd.Index([us], dtype=object)

assert left[0] == right[0]
assert (left == right).all()

>>> left[0] in right  # <- wrong
False
>>> right[0] in left  # <- wrong
False

>>> left.get_loc(right[0])  # <- raises, incorrectly
>>> right.get_loc(left[0])  # <- raises, incorrectly

>>> left.get_indexer(right)  # works correctly AFAICT bc it doesnt use hashtable

# But in a non-monotonic case...
sec = np.datetime64("9999-01-01", "s")
day = np.datetime64("2016-01-01", "D")
left2 = pd.Index([ms, sec, day], dtype=object)

>>> left2[:1].get_indexer(right)
array([0])
>>> left2.get_indexer(right)  # <- wrong
array([-1])
```

IIUC the issue is in the hashing of the datetime64 objects, which do not follow the invariance `x == y \Rightarrow hash(x) == hash(y)` (xref https://github.com/numpy/numpy/issues/3836)

When implementing non-nanosecond support for Timestamp/Timedelta, we implemented `__hash__` to retain this invariance (at the cost of performance).

Unless numpy changes its behavior, I think to fix this we need to patch how we treat datetime64 objects in our khash code, likely mirroring `Timestamp.__hash__`.  cc @realead thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Index.get_indexer with mixed-reso datetime64s #50690

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: Index.get_indexer with mixed-reso datetime64s #50690

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions