Skip to content

API: ExtensionArray.isin treatment of missing values #42545

Closed
@mzeitlin11

Description

@mzeitlin11

The current behavior singles out pd.NA:

In [3]: ser = pd.Series([pd.NA], dtype="Int64")

In [4]: ser.isin([pd.NA])
Out[4]:
0    True
dtype: boolean

In [5]: ser.isin([np.nan])
Out[5]:
0    False
dtype: boolean

StringArray.isin also follows this behavior, so it is not MaskedArray specific.

There was discussion about this being problematic since _from_sequence will treat other missing values just the same as pd.NA (#42473 (comment)). In that case, the output in the second case should also be True.

As a final option, both outputs could be pd.NA. In #38379 there was discussion of propagating pd.NA instead of True/False depending on the presence of missing values in the values argument. A nice description of that debate is here #38379 (comment).

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorDuplicate ReportDuplicate issue or pull requestExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateisinisin method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions