Skip to content

Converting from categorical to int ignores NaNs #28406

Closed
@hnykda

Description

@hnykda

Code Sample, a copy-pastable example if possible

In [6]: s = pd.Series([1, 0, None], dtype='category')                                                                                                                                                                                            

In [7]: s                                                                                                                                                                                                                                      
Out[7]: 
0      1
1      0
2    NaN
dtype: category
Categories (2, int64): [0, 1]

In [8]: s.astype(int)                                                                                                                                                                                                                          
Out[8]: 
0                      1
1                      0
2   -9223372036854775808  # <- this is unexpected
dtype: int64

Problem description

When converting categorical series back into Int column, it converts NaN to incorect integer negative value.

Expected Output

I would expect that NaN in category converts to NaN in IntX(nullable integer) or float.

When trying to use d.astype('Int8'), I get an error dtype not understood

Output of pd.show_versions()

In [147]: pd.show_versions()                                                                                                                                                                                                                   

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.2.13-arch1-1-ARCH
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.2
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : 5.1.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : 0.4.0
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.8.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : 2.7.0
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.14.1
pytables         : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : 3.5.2
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions