Description
I'm importing a fixed width file which has 2 types of records (each with their own definitions).
>>> print "good:\n", pandas.read_fwf(StringIO('T1001\nT1020'),
widths=[2,1,1,1], names=['TYPE', 'A', 'B', 'C'])
good:
TYPE A B C
0 T1 0 0 1
1 T1 0 2 0
>>> print "good:\n", pandas.read_fwf(StringIO('T2XY\nT2XZ'),
widths=[2,1,1], names=['TYPE', 'D', 'E'])
good:
TYPE D E
0 T2 X Y
1 T2 X Z
>>> print "silently dropped data from first 2 rows:\n", pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1], names=['TYPE', 'D', 'E'])
silently dropped data from first 2 rows:
TYPE D E
0 T1 0 0
1 T1 0 2
2 T2 X Y
3 T2 X Z
>>> print "unexpected NaN fields:\n", pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1,1], names=['TYPE', 'A', 'B', 'C'])
unexpected NaN fields:
TYPE A B C
0 T1 0 0 1.0
1 T1 0 2 0.0
2 T2 X Y NaN
3 T2 X Z NaN
Problem description
I expected that lines not matching the passed-in spec would result in a 'bad line' error for that line, and those lines could be ignored.
Expected Output
I expected these lines to raise an error, with the error_bad_lines
option available to ignore the lines and show warnings instead (which could be turned off with warn_bad_lines
).
>>> pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1], names=['TYPE', 'D', 'E'])
ParserError: Error tokenizing data. Expected 4 characters in line 1, saw 5
>>> pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1,1], names=['TYPE', 'A', 'B', 'C'])
ParserError: Error tokenizing data. Expected 5 characters in line 3, saw 4
>>> pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1], names=['TYPE', 'D', 'E'], error_bad_lines=False)
Skipping Line 1: expected 4 characters, saw 5
Skipping Line 2: expected 4 characters, saw 5
TYPE D E
0 T2 X Y
1 T2 X Z
>>> pandas.read_fwf(StringIO('T1001\nT1020\nT2XY\nT2XZ'),
widths=[2,1,1,1], names=['TYPE', 'A', 'B', 'C'], error_bad_lines=False)
Skipping Line 3: expected 5 characters, saw 4
Skipping Line 4: expected 5 characters, saw 4
TYPE A B C
0 T1 0 0 1
1 T1 0 2 0
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-87-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IE.UTF-8
LOCALE: None.None
pandas: 0.20.3
pytest: 2.9.1
pip: 9.0.1
setuptools: 27.3.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 1.5
pytz: 2014.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
sqlalchemy: 1.1.0b1
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.7.2
s3fs: None
pandas_gbq: None
pandas_datareader: None