Closed
Description
Hi all,
I've updated to version 0.15.1 from 0.14.x (Python3.4) and the following code fails in the latest version if the blank line (third line) in data
has a tab or whitespace (I think it should be ignored when skiprows
is used):
import io
import pandas as pd
data = """Lat=0 Lon=0
Whatever - Not very important
YYYYMMDD HHMM M(m/s) D(deg) T(C) De(k/m3) PRE(hPa) RiNumber RH(%)
19840101 0000 21.7 237 9.1 1.23 996.3 0.09 87.4
19840101 0100 22.4 239 9.4 1.23 995.7 0.10 87.2
19840101 0200 22.5 240 9.5 1.23 995.2 0.11 87.5"""
pd.read_csv(io.StringIO(data), delim_whitespace = True, skiprows = 4)
The output with a whitespace or tab in the third line (blank line part of the header) is:
YYYYMMDD | HHMM | M(m/s) | D(deg) | T(C) | De(k/m3) | PRE(hPa) | RiNumber | RH(%) | |
---|---|---|---|---|---|---|---|---|---|
0 | 19840101 | 0 | 21.7 | 237 | 9.1 | 1.23 | 996.3 | 0.09 | 87.4 |
1 | 19840101 | 100 | 22.4 | 239 | 9.4 | 1.23 | 995.7 | 0.10 | 87.2 |
2 | 19840101 | 200 | 22.5 | 240 | 9.5 | 1.23 | 995.2 | 0.11 | 87.5 |
And the output without the whitespace and/or a tab is the expected:
19840101 | 0000 | 21.7 | 237 | 9.1 | 1.23 | 996.3 | 0.09 | 87.4 | |
---|---|---|---|---|---|---|---|---|---|
0 | 19840101 | 100 | 22.4 | 239 | 9.4 | 1.23 | 995.7 | 0.10 | 87.2 |
1 | 19840101 | 200 | 22.5 | 240 | 9.5 | 1.23 | 995.2 | 0.11 | 87.5 |
pd.show_versions()
provides the following output:
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 37 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.15.1
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 3.0.0-dev
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.1
pytz: 2014.9
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.4.4
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None