Skip to content

Potential regression in str dtype handling in 0.23? #21270

Closed
@clembou

Description

@clembou

Code Sample, a copy-pastable example if possible

using pandas 0.22:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

using pandas 0.23:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: object_data = ['silly walks', np.nan, 'dead parrot']

In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0    silly walks
1            NaN
2    dead parrot
dtype: object

In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0    silly walks
1            nan
2    dead parrot
dtype: object

Problem description

as of v0.23, when provided with the str dtype, a np.nan in a list of values gets converter to the string nan.

the docs don't specify str as a valid dtype in the first place , so perhaps the problem here is that pandas should be stricter and check that the provided dtype is supported? Or the docs could mention that a custom conversion function can be passed? either way this caused test failures on clembou/behave-pandas#7, so I thought I'd raise an issue as I might not be the only one affected.

Feel free to close if you think this is simply an unsupported use of the dtype parameter.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions