Skip to content

BUG in remove_unused_categories with NaNs in values #11599

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

From SO: http://stackoverflow.com/questions/33693601/removing-unused-categories-in-series-results-in-duplicated-categories

removed_unused_categories gives a wrong result when there is a NaN in the values:

>>> s = pd.Series(["A", "B", pd.np.nan]).astype("category")
>>> s.cat.remove_unused_categories()
0      A
1      B
2    NaN
dtype: category
Categories (3, object): [B, A, B]

I think the -1 in the codes (NaN) duplicates the last item in the categories

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions