Closed
Description
import pandas as pd
import pyarrow as pa
ser = pd.Series(['a', 'b', None], dtype=pd.StringDtype(storage="pyarrow"))
ser2 = ser.astype(pd.ArrowDtype(pa.string()))
ser.str.contains("foo", na="bar") # <- casts "bar" to True
ser2.str.contains("foo", na="bar") # <- raises
There's a small difference in the _str_contains methods in ArrowExtensionArray vs ArrowStringArray. The latter uses bool(na)
when filling null entries, the former uses na
directly.
I prefer the no-casting behavior, but mainly think we should be consistent.
update Looks like pandas/tests/strings/test_find_replace.py::test_contains_nan
specifically tests na="foo"