Description
Code Sample, a copy-pastable example if possible
import pandas
df = pandas.DataFrame({'test': ['']})
df.test.str.split(expand=True)
Problem description
Splitting a blank (empty or whitespace) string causes an exception when expand=True
.
IndexError Traceback (most recent call last)
<ipython-input-22-a7876ad70d70> in <module>()
1 import pandas
2 df = pandas.DataFrame({'test': ['']})
----> 3 df.test.str.split(expand=True)
4
~/.pyenv/versions/3.6.3/envs/picking/lib/python3.6/site-packages/pandas/core/strings.py in split(self, pat, n, expand)
1479 def split(self, pat=None, n=-1, expand=False):
1480 result = str_split(self._data, pat, n=n)
-> 1481 return self._wrap_result(result, expand=expand)
1482
1483 @copy(str_rsplit)
~/.pyenv/versions/3.6.3/envs/picking/lib/python3.6/site-packages/pandas/core/strings.py in _wrap_result(self, result, use_codes, name, expand)
1427 # propogate nan values to match longest sequence (GH 18450)
1428 max_len = max(len(x) for x in result)
-> 1429 result = [x * max_len if x[0] is np.nan else x for x in result]
1430
1431 if not isinstance(expand, bool):
~/.pyenv/versions/3.6.3/envs/picking/lib/python3.6/site-packages/pandas/core/strings.py in <listcomp>(.0)
1427 # propogate nan values to match longest sequence (GH 18450)
1428 max_len = max(len(x) for x in result)
-> 1429 result = [x * max_len if x[0] is np.nan else x for x in result]
1430
1431 if not isinstance(expand, bool):
IndexError: list index out of range
Expected Output
Either nans or empty strings as placeholder values in the output dataframe.
Out[24]:
0
0 ''
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.0
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None