Skip to content

BUG: dtype not being preserved for replace on a CategoricalDtype #46672

Closed
@galipremsagar

Description

@galipremsagar

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> pd.__version__
'1.4.2'
>>> pdf = pd.DataFrame(
...             {
...                 "a": ["one", "two", None, "three"],
...                 "b": ["one", None, "two", "three"],
...             },
...             dtype="category",
...         )
>>> pdf.dtypes
a    category
b    category
dtype: object
>>> pdf
       a      b
0    one    one
1    two    NaN
2    NaN    two
3  three  three
>>> new_df = pdf.replace(to_replace=[".", "def"], value=["_", None])
>>> new_df
       a      b
0    one    one
1    two    NaN
2    NaN    two
3  three  three
>>> new_df.dtypes
a    object
b    object
dtype: object

Issue Description

When the series is of category dtype, calling replace with above inputs is not preserving the dtype.

Expected Behavior

>>> import pandas as pd
>>> pd.__version__
'1.3.5'
>>> pdf = pd.DataFrame(
...             {
...                 "a": ["one", "two", None, "three"],
...                 "b": ["one", None, "two", "three"],
...             },
...             dtype="category",
...         )
>>> new_df = pdf.replace(to_replace=[".", "def"], value=["_", None])
>>> new_df
       a      b
0    one    one
1    two    NaN
2    NaN    two
3  three  three
>>> new_df.dtypes
a    category
b    category
dtype: object

Installed Versions

pd.show_versions()
Traceback (most recent call last):
File "", line 1, in
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/setuptools/init.py", line 8, in
import _distutils_hack.override # noqa: F401
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/override.py", line 1, in
import('_distutils_hack').do_override()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/init.py", line 72, in do_override
ensure_local_distutils()
File "/nvme/0/pgali/envs/cudfdev/lib/python3.8/site-packages/_distutils_hack/init.py", line 59, in ensure_local_distutils
assert '_distutils' in core.file, core.file
AssertionError: /nvme/0/pgali/envs/cudfdev/lib/python3.8/distutils/core.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsRegressionFunctionality that used to work in a prior pandas versionreplacereplace method

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions