Skip to content

Unexpected segmentation fault in pd.read_csv C-engine #13703

Closed
@ivannz

Description

@ivannz

Dear developers,

I am using pandas in an application where I need to process large csv files (around 1Gb each) which have approximately 800k records and 400+ columns of mixed type. That is why I decided to use data iterator functionality of pd.read_csv(). When experimenting with chunksize my application seems to crash somewhere inside TextReader__string_convert call.

Here is an archive with a sample CSV data file that seems to cause the crash (it also includes crash dump reports, a copy of the example, and a snapshot of versions of installed python packages).
read_csv_crash.tar.gz

Code Sample

To run this example you would have to extract dataset.csv from the supplied archive.

import pandas as pd
for n_lines in range(82, 87):
    filelike = open("dataset.csv", "r")
    iterator_ = pd.read_csv(filelike, header=None, engine="c",
                            dtype=object, chunksize=n_lines)
    for chunk_ in iterator_:
        print n_lines, chunk_.iloc[0, 0], chunk_.iloc[-1, 0]
    filelike.close()

Please, note that this crash does not seem to occur when the file is less than 260Kib. Also note that playing with low_memory setting did not alleviate the problem.

Expected Output

This code sample outputs this:

82 9999-9 9999-9
82 9999-9 9999-9
82 9999-9 9999-9
83 9999-9 9999-9
83 9999-9 9999-9
83 9999-9 9999-9
84 9999-9 9999-9
84 9999-9 9999-9
Segmentation fault: 11

output of pd.show_versions()

The output of this call is attached to this issue.
pd_show_versions.txt

Python greetings string

Python 2.7.10 (v2.7.10:15c95b7d81dc, May 23 2015, 09:33:12) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

OSX version

OS Version:            Mac OS X 10.11.5 (15F34)
Model: Macmini6,2, BootROM MM61.0106.B0A, 4 processors, Intel Core i7, 2,6 GHz, 16 GB, SMC 2.8f

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions