Skip to content

BUG (string dtype): replace() value in string column with non-string should cast to object dtype instead of raising an error #60282

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

For all other dtypes (I think, just checked with the one below), if the value to replace with in replace() doesn't fit into the calling series, then we "upcast" to object dtype and then do the replacement anyway.

Simple example with an integer series:

>>> ser = pd.Series([1, 2])
>>> ser.replace(1, "str")
0    str
1      2
dtype: object

However, for the future string dtype, and then trying to replace a value with a non-string, we do not cast to object dtype currently, but raise instead:

>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b"])
>>> ser.replace("a", 1)
...
File ~/scipy/repos/pandas/pandas/core/internals/blocks.py:713, in Block.replace(self, to_replace, value, inplace, mask)
    709 elif self._can_hold_element(value):
    710     # TODO(CoW): Maybe split here as well into columns where mask has True
    711     # and rest?
    712     blk = self._maybe_copy(inplace)
--> 713     putmask_inplace(blk.values, mask, value)
    714     return [blk]
    716 elif self.ndim == 1 or self.shape[0] == 1:
...

File ~/scipy/repos/pandas/pandas/core/arrays/string_.py:746, in __setitem__(self, key, value)
...
TypeError: Invalid value '1' for dtype 'str'. Value should be a string or missing value, got 'int' instead.

Making replace() strict (preserve dtype) in general is a much bigger topic, so I think for now we should just keep the current behaviour of upcasting to object dtype when needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugStringsString extension data type and string datareplacereplace method

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions