Skip to content

BUG/Internals: maybe_promote #23833

Open
Open
@h-vetinari

Description

@h-vetinari

Seems I found a pretty deep rabbit hole while trying to solve #23823 (while trying to solve #23192 / #23604):

maybe_upcast_putmask and maybe_promote are both completely untested (or at least, their names do not appear anywhere in pandas/tests/), and maybe_promote also does not have a docstring. Side note: ran into a segfault while trying to remove some old numpy compat code from that method in #23796.

Aside from missing a docstring and tests, the behaviour is also false, at least regarding integer types:

>>> import numpy as np
>>> from pandas.core.dtypes.cast import maybe_promote
>>> maybe_promote(np.dtype('int8'), np.array([10, np.iinfo('int8').max + 1, 12]))
(<class 'numpy.float64'>, nan)

To me, this should clearly upcast to int16 instead of float (using arrays for fill_value is correct usage, as done e.g. in maybe_upcast_putmask as maybe_promote(result.dtype, other), and has a dedicated code branch in maybe_promote).

In int-to-int promotion, the question is what to return as an actual fill_value though. Of course, this method is being used in pretty central code paths, but the number of uses is not that high (on master; half of the instances are imports/redefinitions).

pandas/core\algorithms.py:12:    maybe_promote, construct_1d_object_array_from_listlike)
pandas/core\algorithms.py:1572:        _maybe_promote to determine this type for any fill_value
pandas/core\algorithms.py:1617:            dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\algorithms.py:1700:            dtype, fill_value = maybe_promote(arr.dtype, fill_value)
pandas/core\dtypes\cast.py:228:        new_dtype, _ = maybe_promote(result.dtype, other)
pandas/core\dtypes\cast.py:252:def maybe_promote(dtype, fill_value=np.nan):
pandas/core\dtypes\cast.py:538:        new_dtype, fill_value = maybe_promote(dtype, fill_value)
pandas/core\generic.py:34:from pandas.core.dtypes.cast import maybe_promote, maybe_upcast_putmask
pandas/core\generic.py:8289:                            dtype, fill_value = maybe_promote(other.dtype)
pandas/core\indexes\base.py:3371:        pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3505:        pself, ptarget = self._maybe_promote(target)
pandas/core\indexes\base.py:3528:    def _maybe_promote(self, other):
pandas/core\indexes\datetimes.py:924:    def _maybe_promote(self, other):
pandas/core\indexes\timedeltas.py:409:    def _maybe_promote(self, other):
pandas/core\internals\blocks.py:45:    maybe_promote,
pandas/core\internals\blocks.py:899:            dtype, _ = maybe_promote(arr_value.dtype)
pandas/core\internals\blocks.py:1054:                    dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\blocks.py:3174:        dtype, fill_value = maybe_promote(values.dtype)
pandas/core\internals\blocks.py:3293:    dtype, _ = maybe_promote(n.dtype)
pandas/core\internals\concat.py:19:from pandas.core.dtypes.cast import maybe_promote
pandas/core\internals\concat.py:137:            return _get_dtype(maybe_promote(self.block.dtype,
pandas/core\internals\managers.py:22:    maybe_promote,
pandas/core\internals\managers.py:1277:                    _, fill_value = maybe_promote(blk.dtype)
pandas/core\reshape\reshape.py:12:from pandas.core.dtypes.cast import maybe_promote
pandas/core\reshape\reshape.py:192:            dtype, fill_value = maybe_promote(values.dtype, self.fill_value)

Therefore it might make sense to adapt the private API, e.g. adding a kwarg must_hold_na and/or return_default_na. I've inspected all the occurrences of the code above, and this would not be a problem to implement.

Once I get around to it, will probably split this into two PRs, one just for adding tests/docstring, and one to change...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions