Skip to content

BUG (string dtype): comparison of string column to mixed object column fails #60228

Open
@jorisvandenbossche

Description

@jorisvandenbossche

At the moment you can freely compare with mixed object dtype column:

>>> ser_string = pd.Series(["a", "b"])
>>> ser_mixed = pd.Series([1, "b"])
>>> ser_string == ser_mixed
0    False
1     True
dtype: bool

But with the string dtype enabled (using pyarrow), this now raises an error:

>>> pd.options.future.infer_string = True
>>> ser_string = pd.Series(["a", "b"])
>>> ser_mixed = pd.Series([1, "b"])
>>> ser_string == ser_mixed
...
File ~/scipy/repos/pandas/pandas/core/arrays/arrow/array.py:510, in ArrowExtensionArray._box_pa_array(cls, value, pa_type, copy)
...
--> 510     pa_array = pa.array(value, from_pandas=True)
...
ArrowInvalid: Could not convert 'b' with type str: tried to convert to int64

This happens because the ArrowEA tries to convert the other operand to Arrow as well, which fails for mixed types.

In general, I think our rule is that == comparison never fails, but then just gives False for when values are not comparable.

Metadata

Metadata

Assignees

Labels

BugStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions