Skip to content

read_excel with dtype=str converts empty cells to the string 'nan' #20377

Closed
@arnau126

Description

@arnau126

Code Sample, a copy-pastable example if possible

In [10]: df = pd.DataFrame({'a': ['x', 'y', '', 'z']})

In [11]: df
Out[11]: 
   a
0  x
1  y
2   
3  z

In [12]: df.to_excel('temp.xlsx')

In [13]: df = pd.read_excel('temp.xlsx', dtype=str)

In [14]: df
Out[14]: 
     a
0    x
1    y
2  nan
3    z

In [15]: df.loc[2, 'a']
Out[15]: 'nan'

In [16]: type(df.loc[2, 'a'])
Out[16]: str

Problem description

The empty string of the original dataframe becomes the string 'nan', instead of numpy.nan.

Expected Output

In [14]: df
Out[14]: 
     a
0    x
1    y
2  NaN
3    z

In [15]: df.loc[2, 'a']
Out[15]: nan

In [16]: type(df.loc[2, 'a'])
Out[16]: float

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: ca_ES.UTF-8
LOCALE: ca_ES.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.2
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO DataIO issues that don't fit into a more specific labelIO Excelread_excel, to_excelMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions