API: handling of missing values in Index.__contains__

The below table gives an overview of the result value for:

```
missing_value in idx
```

i.e. how `Index.__contains__` handles various missing value sentinels as input for the different data types.

| dtype          | None   | nan   | \<NA\>   | NaT   |
|:---------------|:-------|:------|:-------|:------|
| object-none    | True   | False | False  | False |
| object-nan     | False  | True  | False  | False |
| object-NA      | False  | False | True   | False |
| datetime       | True   | True  | True   | True  |
| period         | True   | True  | True   | True  |
| timedelta      | True   | True  | True   | True  |
| float64        | False  | True  | False  | False |
| categorical    | True   | True  | True   | True  |
| interval       | True   | True  | True   | False |
| nullable_int   | False  | False | True   | False |
| nullable_float | False  | False | True   | False |
| string-python  | False  | False | False  | False |
| string-pyarrow | False  | False | False  | False |
| str-python     | False  | False | False  | False |

The last three rows with not a single True are specifically problematic, this seems a bug with the StringDtype

But more in general, this is quite inconsistent:

- For object dtype, we require exact match
- For datetimelike and categorical, we match any missing-like
- For interval, we match any missing-like except NaT (also not in case of datetimelike interval dtype)
- For float we only match NaN
- For nullable dtypes (int/float), we only match NA


The code to generate the table above:

<details>

```python
import numpy as np
import pandas as pd

# from conftest.py
indices_dict = {
    "object-none": pd.Index(["a", None], dtype=object),
    "object-nan": pd.Index(["a", np.nan], dtype=object),
    "object-NA": pd.Index(["a", pd.NA], dtype=object),
    "datetime": pd.DatetimeIndex(["2024-01-01", "NaT"]),
    "period": pd.PeriodIndex(["2024-01-01", None], freq="D"),
    "timedelta": pd.TimedeltaIndex(["1 days", "NaT"]),
    "float64": pd.Index([2.0, np.nan], dtype="float64"),
    "categorical": pd.CategoricalIndex(["a", None]),
    "interval": pd.IntervalIndex.from_tuples([(1, 2), np.nan]),
    "nullable_int": pd.Index([2, None], dtype="Int64"),
    "nullable_float": pd.Index([2.0, None], dtype="Float32"),
    "string-python": pd.Index(["a", None], dtype="string[python]"),
    "string-pyarrow": pd.Index(["a", None], dtype="string[pyarrow]"),
    "str-python": pd.Index(["a", None], dtype=pd.StringDtype("pyarrow", na_value=np.nan))
}

results = []

for dtype, data in indices_dict.items():
    for val in [None, np.nan, pd.NA, pd.NaT]:
        res = val in data
        results.append((dtype, str(val), res))
        
df = pd.DataFrame(results, columns=["dtype", "val", "result"])
df_overview = df.pivot(columns="val", index="dtype", values="result").reindex(columns=df["val"].unique(), index=df["dtype"].unique())

print(df_overview.astype(str).to_markdown())
```

</details>

cc @jbrockmendel I would have expected we had issues about this, but didn't directly find anything

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: handling of missing values in Index.contains #59765

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

dtype	None	nan	<NA>	NaT
object-none	True	False	False	False
object-nan	False	True	False	False
object-NA	False	False	True	False
datetime	True	True	True	True
period	True	True	True	True
timedelta	True	True	True	True
float64	False	True	False	False
categorical	True	True	True	True
interval	True	True	True	False
nullable_int	False	False	True	False
nullable_float	False	False	True	False
string-python	False	False	False	False
string-pyarrow	False	False	False	False
str-python	False	False	False	False

API: handling of missing values in Index.__contains__ #59765

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

API: handling of missing values in Index.contains #59765