Skip to content

Pandas 1.0 no longer handles numpy.str_s as catgories #31499

Closed
@flying-sheep

Description

@flying-sheep

Code Sample

import pandas as pd
pd.Categorical(['1', '0', '1'], [np.str_('0'), np.str_('1')])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/angerer/Dev/Python/venvs/env-pandas-1/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 385, in __init__
    codes = _get_codes_for_values(values, dtype.categories)
  File "/home/angerer/Dev/Python/venvs/env-pandas-1/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 2576, in _get_codes_for_values
    t.map_locations(cats)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1403, in pandas._libs.hashtable.StringHashTable.map_locations
TypeError: Expected unicode, got numpy.str_

Problem description

I know that having a list of numpy.str_s seems weird, but it easily happens when you use non-numpy algorithms on numpy arrays (e.g. natsort.natsorted in our case), or via comprehensions or so:

>>> np.array(['1', '0'])[0].__class__
<class 'numpy.str_'>
>>> [type(s) for s in np.array(['1', '0'])]
[<class 'numpy.str_'>, <class 'numpy.str_'>]

Expected Output

A normal pd.Categorical

Pandas version

pandas 1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions