Skip to content

str.split on np.nan gives np.nan in one column but None in another column #18450

Closed
@JeroenDelcour

Description

@JeroenDelcour
import pandas as pd
import numpy as np

s = pd.Series(['19HT|C2', np.nan, '20ZT|C1'])
print(s)
0    19HT|C2
1        NaN
2    20ZT|C1
dtype: object
s_split = s.str.split('|', expand=True)
print(s_split)
      0     1
0  19HT    C2
1   NaN  None
2  20ZT    C1
print(s_split.dtypes)
0    object
1    object
dtype: object
print(type(s_split.loc[1,0]))
float
print(type(s_split.loc[1,1]))
NoneType

Problem description

When np.nan gets split, it becomes np.nan (of type float) in the first column but None (of type NoneType) in the second column. I'd consider this unexpected behavior. How come splitting a value of one type results in two values of different types?

Expected Output

      0     1
0  19HT    C2
1   NaN   NaN
2  20ZT    C1

Either np.nan or None in both columns, but not a mix of both. I'd say np.nan makes most sense, since that's the original value of the row.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-40-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
pyarrow: 0.7.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions