Segmentation fault or UnicodeDecodeError when reading csv-file depending on chunksize.

I have encountered an issue with the csv parser, pandas.io.parser.read_csv. I get segmentation fault or UnicodeDecodeError when reading a csv-file in chunks and it seems like that the problem depends on the size of the chunks.
Consider the following code:

``` python
import codecs
import csv
import pandas as pd


def create_csv_file(columns, rows):
    csv_file_name = 'csv_test_file.csv'
    with codecs.open(csv_file_name, mode='w', encoding='utf_8') as csv_file:
        csv_writer = csv.writer(csv_file, delimiter=',')

        for row in xrange(rows):
            csv_writer.writerow(
                [float(row)] * columns)

    return csv_file_name


def main():
    """
    """
    columns = 20
    rows = 10000
    chunksize = 999
    csv_file_name = create_csv_file(columns, rows)
    reader = pd.io.parsers.read_csv(csv_file_name,
                                    header=None,
                                    chunksize=chunksize,
                                    encoding='utf_8')

    for x, dataframe in enumerate(reader, 1):
        print x * chunksize


if __name__ == "__main__":
    main()
```

I get segmentation fault from the attached code, when the chunksize is 999 rows. If the chunksize is decreased to 998 rows, I instead get an UnicodeDecodeError. If the chunksize is increased to 1000 rows there are no problems of reading the csv-file. My first guess was that the problem appears when the last chunk include too few rows, but I was surprised when reading of the csv-file with the following setting,

``` python
    columns = 20
    rows = 1000
    chunksize = 99
```

worked properly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault or UnicodeDecodeError when reading csv-file depending on chunksize. #5291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segmentation fault or UnicodeDecodeError when reading csv-file depending on chunksize. #5291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions