Description
Code Sample, a copy-pastable example if possible
using pandas 0.22:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: object_data = ['silly walks', np.nan, 'dead parrot']
In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0 silly walks
1 NaN
2 dead parrot
dtype: object
In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0 silly walks
1 NaN
2 dead parrot
dtype: object
using pandas 0.23:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: object_data = ['silly walks', np.nan, 'dead parrot']
In [4]: pd.Series(object_data, dtype=object)
Out[4]:
0 silly walks
1 NaN
2 dead parrot
dtype: object
In [5]: pd.Series(object_data, dtype=str)
Out[5]:
0 silly walks
1 nan
2 dead parrot
dtype: object
Problem description
as of v0.23, when provided with the str
dtype, a np.nan
in a list of values gets converter to the string nan
.
the docs don't specify str
as a valid dtype in the first place , so perhaps the problem here is that pandas should be stricter and check that the provided dtype is supported? Or the docs could mention that a custom conversion function can be passed? either way this caused test failures on clembou/behave-pandas#7, so I thought I'd raise an issue as I might not be the only one affected.
Feel free to close if you think this is simply an unsupported use of the dtype
parameter.
Output of pd.show_versions()
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.3
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None