Skip to content

BUG: read_csv() crashes with engine='c' #14125

Closed
@jzwinck

Description

@jzwinck

Here's the code (input data is at the end of this message):

pd.read_csv('foo.csv', header=None, usecols=[0])

It fails with:

File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
File "pandas/parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)
File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

Small perturbations of the input file (adding or removing characters) makes it work, as does engine='python'. Note that while one row (or more) of the file contains "extra" columns, I have only asked Pandas to read column 0, which it should well be able to do since that column has a consistent, short length.

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-85-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1

DATA

1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,1111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1,1,111111111,XXXXX,X,XXXXX XXX,X.X.,X.X.,X.X., , ,11111111,11111111,11111111,X.X.,XXX111XXXXX1,X.X.,X.X.,11,XX_XXXX,1,XX_XXX,1111 XX,XX_XXXX,XXXXXXX XXXX,XX_X1_XX_XXXX,111111,XX_XXXX,XXXXXXXXXXXX XXXXXX XXXXXXXX,XX_XXX_XXX,11111.111111,XX_XXXXXX_XXX,1,XX_XXXX,11111.111111,XX_XXXX_XXXX,1,XX_XX,1.111111,XX_XX_XXXX, ,XX_1XXX,1.111111,XX_1XXX_XXXX, ,XX_XXXX,1,XX_1X_XXXX,X.X.,XX_XXXX_XXXXX_XXXXXXX,X.X.,XX_XXX_XXXXXXX,X.X.,XX_XXX_XXXX1,1.111111,XX_XXX_XXXXXX,111.111,XX_XXXXXXXXX1,X.X.,XX_XXXX,1,XX_XXXXX,XXX,XX_XXXX_XXX,X.X.,XX_XXXXXXXXX_XXXX,X.X.,XX_XXX_XXX_XXX,X.X.,XX_X1XXXXXX_XXX,X.X.,XX_XX_XXXXXXXXXX,X.X.,XX_X1XXXXXX,X.X.,XXX,1111 XX,XX_XXX_X1_X_XXXXXX,XXX,XX_XXX_X1_XX_XXXXXX_XXXXXXX,XXX111XXXXX1,XX_XXX_X1_XX_XXX_XXX_1XX,1111,XX_XXX_XXX1_XXXXXX,XX,XX_XXXXXX_XXX,1111 XX,XX_XXXXXX_X1_XX_XXXXXX,1X11,XX_XXXXXX_X1_XX_XXXXXX_XXXXXXX,XXX111XXXX11,XX_XXXXXX_X1_XX_XXX_XXX_1XX,1111,XX_XXXXXX_XXX1_XXXXXX,XX,XX_XXXXX,XXXX XXX'X: XXXXXXX XX1XXXX XXXXXXX XXXX.,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,1111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,
1111    XX XXXXXX,111111,1111,111,

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions