Description
Here is my repro script:
import pandas as pd
import sys
for df in pd.read_csv(sys.argv[1], chunksize=1000):
print(df[['sum']].sum())
and I am attaching small.csv.gz as the smallest data set I know reproduces this segfault. Running python repro.py small.csv.gz
reproduces the segfault in 0.17.1 on OSX Yosemite. I can't reproduce with 0.13.1 or 0.17.1 on Ubuntu 14.04. Removing chunksize
works normally with that file.
I tried my best to narrow it down. You can edit this file down to under 2000 lines and the segfault does not occur. Once it goes over 2000 lines I start to see the segfault. I can add lines 1000 at a time and notice the segfault is intermittent (I see it again at 6002 lines). It seems like to me if there are a multiple of chunksize
items in the file the segfault does not occur.
I installed via pip install pandas
. I also repro'd this on latest master (43edd83) on OSX Yosemite.
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.4
pip: 1.5.6
setuptools: 8.2.1
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.15.1
statsmodels: None
IPython: 2.3.1
sphinx: 1.1.2
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000110bbf0bf
VM Regions Near 0x110bbf0bf:
MALLOC_LARGE 0000000110b3f000-0000000110bbf000 [ 512K] rw-/rwx SM=PRV
-->
MALLOC_LARGE 0000000110cae000-0000000110e2e000 [ 1536K] rw-/rwx SM=PRV
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 parser.so 0x000000011069af3f __pyx_f_6pandas_6parser_10TextReader__convert_with_dtype + 2191
1 parser.so 0x00000001106977ed __pyx_f_6pandas_6parser_10TextReader__convert_tokens + 3293
2 parser.so 0x00000001106c47fe __pyx_pf_6pandas_6parser_10TextReader_16_convert_column_data + 3006
3 parser.so 0x000000011069572b __pyx_f_6pandas_6parser_10TextReader__read_rows + 1371
4 parser.so 0x0000000110693f65 __pyx_f_6pandas_6parser_10TextReader__read_low_memory + 869
5 parser.so 0x00000001106c2b9e __pyx_pw_6pandas_6parser_10TextReader_9read + 174
6 org.python.python 0x000000010e1f77e6 PyEval_EvalFrameEx + 14392
7 org.python.python 0x000000010e1f3d7a PyEval_EvalCodeEx + 1409
8 org.python.python 0x000000010e1fa59d fast_function + 117
9 org.python.python 0x000000010e1f7400 PyEval_EvalFrameEx + 13394
10 org.python.python 0x000000010e1f3d7a PyEval_EvalCodeEx + 1409
11 org.python.python 0x000000010e1fa59d fast_function + 117
12 org.python.python 0x000000010e1f7400 PyEval_EvalFrameEx + 13394
13 org.python.python 0x000000010e18f67a gen_send_ex + 193
14 org.python.python 0x000000010e1f4525 PyEval_EvalFrameEx + 1399
15 org.python.python 0x000000010e1f3d7a PyEval_EvalCodeEx + 1409
16 org.python.python 0x000000010e1f37f3 PyEval_EvalCode + 54
17 org.python.python 0x000000010e2138a2 run_mod + 53
18 org.python.python 0x000000010e213945 PyRun_FileExFlags + 133
19 org.python.python 0x000000010e2134e2 PyRun_SimpleFileExFlags + 769
20 org.python.python 0x000000010e224c5b Py_Main + 3051
21 libdyld.dylib 0x00007fff8c26c5c9 start + 1