Skip to content

REGR: Replacing a category with itself replaces it with np.nan #33288

Closed
@jtilly

Description

@jtilly
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
pd.Series(["a", "b"]).astype("category").replace("a", "a")
# 0    NaN
# 1      b
# dtype: category
# Categories (1, object): [b]

Operating on the categorical array directly, i.e. pd.Categorical(["a", "b"]).replace("a", "a") yields the same result.

Problem description

Replacing a category with itself replaces it with np.nan. This problem was introduced with 1.0.0.

Expected Output

I would have expected the behavior from 0.25.3:

pd.Series(["a", "b"]).astype("category").replace("a", "a")
# 0    a
# 1    b
# dtype: category
# Categories (2, object): [a, b]

Note that if we work with lists, we get

pd.Series(["a", "b"]).astype("category").replace(["a"], ["a"])
# dtype: category
# 0    a
# 1    b
# type: object

which is also not what I would expect, because we're now losing the dtype. This behavior has been described elsewhere (e.g. #31734 (comment)) and it's consistent with 0.25.3.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.2.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.4.0-176-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.3
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 46.1.3.post20200325
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions