Skip to content

BUG: .str.contains na validation #59561

Closed
@jbrockmendel

Description

@jbrockmendel
import pandas as pd
import pyarrow as pa

ser = pd.Series(['a', 'b', None], dtype=pd.StringDtype(storage="pyarrow"))
ser2 = ser.astype(pd.ArrowDtype(pa.string()))

ser.str.contains("foo", na="bar")  # <- casts "bar" to True
ser2.str.contains("foo", na="bar")  # <- raises

There's a small difference in the _str_contains methods in ArrowExtensionArray vs ArrowStringArray. The latter uses bool(na) when filling null entries, the former uses na directly.

I prefer the no-casting behavior, but mainly think we should be consistent.

update Looks like pandas/tests/strings/test_find_replace.py::test_contains_nan specifically tests na="foo"

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorBugStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions