Description
I noticed that in certain cases, when replacing a value with None
, that we always cast to object dtype, regardless of whether the dtype of the calling series can actually hold None (at least, when considering None
just as a generic "missing value" indicator).
For example, a float Series can hold None
in the sense of holding missing values, which is how None
is treated in setitem:
>>> ser = pd.Series([1, 2, 3], dtype="float")
>>> ser[1] = None
>>> ser
0 1.0
1 NaN
2 3.0
dtype: float64
However, when using replace()
to change the value 2.0 with None, it depends on the exact way to specify the to_replace/value combo, but typically it will upcast to object:
# with list
>>> ser.replace([1, 2], [10, None])
0 10.0
1 None
2 3.0
dtype: object
# with Series -> here it gives NaN but that is because the Series constructor already coerces the None
>>> ser.replace(pd.Series({1: 10, 2: None}))
0 10.0
1 NaN
2 3.0
dtype: float64
# with scalar replacements
>>> ser.replace(1, 10).replace(2, None)
0 10.0
1 None
2 3.0
dtype: object
In all the above cases, when replacing None
with np.nan
, it of course just results in a float Series with NaN.
The reason for this is two-fold. First, in Block._replace_coerce
there is a check specifically for value is None
and in that case we always cast to object dtype:
pandas/pandas/core/internals/blocks.py
Lines 906 to 910 in 5f23ace
The above is used when replacing with a list of values. But for the scalar case, we also cast to object dtype because in this case we check for if self._can_hold_element(value)
to do the replacement with a simple setitem (and if not cast to object dtype first before trying again). But it seems that can_hold_element(np.array([], dtype=float), None)
gives False ..
Everything is tested with current main (3.0.0.dev), but I see the same behaviour on older releases (2.0 and 1.5)
Somewhat related issue: