Description
Seems I found a pretty deep rabbit hole while trying to solve #23823 (while trying to solve #23192 / #23604):
maybe_upcast_putmask
and maybe_promote
are both completely untested (or at least, their names do not appear anywhere in pandas/tests/
), and maybe_promote
also does not have a docstring. Side note: ran into a segfault while trying to remove some old numpy compat code from that method in #23796.
Aside from missing a docstring and tests, the behaviour is also false, at least regarding integer types:
>>> import numpy as np
>>> from pandas.core.dtypes.cast import maybe_promote
>>> maybe_promote(np.dtype('int8'), np.array([10, np.iinfo('int8').max + 1, 12]))
(<class 'numpy.float64'>, nan)
To me, this should clearly upcast to int16
instead of float
(using arrays for fill_value
is correct usage, as done e.g. in maybe_upcast_putmask
as maybe_promote(result.dtype, other)
, and has a dedicated code branch in maybe_promote
).
In int-to-int promotion, the question is what to return as an actual fill_value
though. Of course, this method is being used in pretty central code paths, but the number of uses is not that high (on master; half of the instances are imports/redefinitions).
pandas/core\algorithms.py:12: maybe_promote, construct_1d_object_array_from_listlike)
pandas/core\algorithms.py:1572: _maybe_promote to determine this type for any fill_value
pandas/core\algorithms.py:1617: dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\algorithms.py:1700: dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\dtypes\cast.py:228: new_dtype, _ = maybe_promote(result.dtype, other)
pandas/core\dtypes\cast.py:252:def maybe_promote(dtype, fill_value=np.nan):
pandas/core\dtypes\cast.py:538: new_dtype, fill_value = maybe_promote(dtype, fill_value)
pandas/core\generic.py:34:from pandas.core.dtypes.cast import maybe_promote, maybe_upcast_putmask
pandas/core\generic.py:8289: dtype, fill_value = maybe_promote(other.dtype)
pandas/core\indexes\base.py:3371: pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3505: pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3528: def _maybe_promote(self, other):
pandas/core\indexes\datetimes.py:924: def _maybe_promote(self, other):
pandas/core\indexes\timedeltas.py:409: def _maybe_promote(self, other):
pandas/core\internals\blocks.py:45: maybe_promote,
pandas/core\internals\blocks.py:899: dtype, _ = maybe_promote(arr_value.dtype)
pandas/core\internals\blocks.py:1054: dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\blocks.py:3174: dtype, fill_value = maybe_promote(values.dtype)
pandas/core\internals\blocks.py:3293: dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\concat.py:19:from pandas.core.dtypes.cast import maybe_promote
pandas/core\internals\concat.py:137: return _get_dtype(maybe_promote(self.block.dtype,
pandas/core\internals\managers.py:22: maybe_promote,
pandas/core\internals\managers.py:1277: _, fill_value = maybe_promote(blk.dtype)
pandas/core\reshape\reshape.py:12:from pandas.core.dtypes.cast import maybe_promote
pandas/core\reshape\reshape.py:192: dtype, fill_value = maybe_promote(values.dtype, self.fill_value)
Therefore it might make sense to adapt the private API, e.g. adding a kwarg must_hold_na
and/or return_default_na
. I've inspected all the occurrences of the code above, and this would not be a problem to implement.
Once I get around to it, will probably split this into two PRs, one just for adding tests/docstring, and one to change...