Skip to content

BUG?: using None as replacement value in replace() typically upcasts to object dtype #60284

Open
@jorisvandenbossche

Description

@jorisvandenbossche

I noticed that in certain cases, when replacing a value with None, that we always cast to object dtype, regardless of whether the dtype of the calling series can actually hold None (at least, when considering None just as a generic "missing value" indicator).

For example, a float Series can hold None in the sense of holding missing values, which is how None is treated in setitem:

>>> ser = pd.Series([1, 2, 3], dtype="float")
>>> ser[1] = None
>>> ser
0    1.0
1    NaN
2    3.0
dtype: float64

However, when using replace() to change the value 2.0 with None, it depends on the exact way to specify the to_replace/value combo, but typically it will upcast to object:

# with list
>>> ser.replace([1, 2], [10, None])
0    10.0
1    None
2     3.0
dtype: object

# with Series -> here it gives NaN but that is because the Series constructor already coerces the None
>>> ser.replace(pd.Series({1: 10, 2: None}))
0    10.0
1     NaN
2     3.0
dtype: float64

# with scalar replacements
>>> ser.replace(1, 10).replace(2, None)
0    10.0
1    None
2     3.0
dtype: object

In all the above cases, when replacing None with np.nan, it of course just results in a float Series with NaN.

The reason for this is two-fold. First, in Block._replace_coerce there is a check specifically for value is None and in that case we always cast to object dtype:

if value is None:
# gh-45601, gh-45836, gh-46634
if mask.any():
has_ref = self.refs.has_reference()
nb = self.astype(np.dtype(object))

The above is used when replacing with a list of values. But for the scalar case, we also cast to object dtype because in this case we check for if self._can_hold_element(value) to do the replacement with a simple setitem (and if not cast to object dtype first before trying again). But it seems that can_hold_element(np.array([], dtype=float), None) gives False ..


Everything is tested with current main (3.0.0.dev), but I see the same behaviour on older releases (2.0 and 1.5)


Somewhat related issue:

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatereplacereplace method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions