Skip to content

bug? with delim_whitespace and 'skiprows' in read_csv #8960

Closed
@kikocorreoso

Description

@kikocorreoso

Hi all,

I've updated to version 0.15.1 from 0.14.x (Python3.4) and the following code fails in the latest version if the blank line (third line) in data has a tab or whitespace (I think it should be ignored when skiprows is used):

import io

import pandas as pd

data = """Lat=0  Lon=0
Whatever - Not very important

YYYYMMDD HHMM  M(m/s) D(deg)  T(C)  De(k/m3) PRE(hPa)      RiNumber  RH(%)
19840101 0000   21.7    237    9.1    1.23     996.3           0.09   87.4
19840101 0100   22.4    239    9.4    1.23     995.7           0.10   87.2
19840101 0200   22.5    240    9.5    1.23     995.2           0.11   87.5"""

pd.read_csv(io.StringIO(data), delim_whitespace = True, skiprows = 4)

The output with a whitespace or tab in the third line (blank line part of the header) is:

YYYYMMDD HHMM M(m/s) D(deg) T(C) De(k/m3) PRE(hPa) RiNumber RH(%)
0 19840101 0 21.7 237 9.1 1.23 996.3 0.09 87.4
1 19840101 100 22.4 239 9.4 1.23 995.7 0.10 87.2
2 19840101 200 22.5 240 9.5 1.23 995.2 0.11 87.5

And the output without the whitespace and/or a tab is the expected:

19840101 0000 21.7 237 9.1 1.23 996.3 0.09 87.4
0 19840101 100 22.4 239 9.4 1.23 995.7 0.10 87.2
1 19840101 200 22.5 240 9.5 1.23 995.2 0.11 87.5

pd.show_versions() provides the following output:

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 37 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.1
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 3.0.0-dev
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.1
pytz: 2014.9
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: 2.4.4
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions