Skip to content

BUG: convert_dtypes changes BooleanDtype to Int64 #32287

Closed
@jiannmeng

Description

@jiannmeng

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> df = pd.DataFrame(data=[["abc", 123, True]])
>>> print(df)
     0    1     2
0  abc  123  True
>>> print(df.dtypes)
0    object
1     int64
2      bool
dtype: object
>>> df = df.convert_dtypes()
>>> print(df)
     0    1     2
0  abc  123  True
>>> print(df.dtypes)
0     string
1      Int64
2    boolean
dtype: object
>>> df = df.convert_dtypes()
>>> print(df)
        0    1  2
0  b'abc'  123  1
>>> print(df.dtypes)
0    object
1     Int64
2     Int64
dtype: object

Problem description

Applying convert_dtypes() to a column with dtype string converts it to a column dtype 'object' (and the individual values from str type to bytes type).

Applying convert_dtypes() to a column with dtype boolean converts it to a column dtype 'Int64' (and the individual values from bool type to int type).

Expected Output

convert_dtypes() should keep StringDtype columns as StringDtype and BooleanDtype columns as BooleanDtype.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Malaysia.1252

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None

Metadata

Metadata

Assignees

Labels

BugNA - MaskedArraysRelated to pd.NA and nullable extension arrays

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions