Skip to content

Segmentation fault when conctructing DataFrame with specified datetime dtype of one column #5191

Closed
@agravier

Description

@agravier

Description

When building a DataFrame with specified column names and dtypes, one might expect one of two possible behaviours:

  • The column names and dtypes specs are perfectly cromulent, and Pandas goes on to build the object.
  • The column names or dtypes don't match the data shape, or the dtypes are badly specified, and Pandas gives an error message.

Instead, I have encountered a segmentation fault.

Now, it is unclear to me whether my column names spec and dtypes are correctly written and if my data is proper too (see example below). But in any case, it should not crash.

Reproducing

To reproduce, please run:

import pandas as pd
import datetime as dt
import itertools as it

df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
                       columns=["A", "B", "C"],
                       dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])

Modes of failure

I have found that the above script always crashes on my machine (see next section for detailed configuration information). It does it in 2 possible ways:

First mode of failure: hanging

Python 2.7.5 (default, Sep  6 2013, 09:55:21) 
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>> 
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python': corrupted double-linked list: 0x0000000001bfd8e0 ***

After that line, the terminal is dead.

Second mode of failure: segfault

Python 2.7.5 (default, Sep  6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python2': double free or corruption (!prev): 0x00000000027161d0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72ecf)[0x7f2bd7ab9ecf]
/usr/lib/libc.so.6(+0x7869e)[0x7f2bd7abf69e]
/usr/lib/libc.so.6(+0x79377)[0x7f2bd7ac0377]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(_field_transfer_data_free+0x2e)[0x7f2bd634d47e]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0x9a1c9)[0x7f2bd63a61c9]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xa4a3a)[0x7f2bd63b0a3a]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xab0a1)[0x7f2bd63b70a1]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb838b)[0x7f2bd63c438b]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb8643)[0x7f2bd63c4643]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4c2f)[0x7f2bd80ec2ef]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(+0x6dbdd)[0x7f2bd807cbdd]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x5841d)[0x7f2bd806741d]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x9de57)[0x7f2bd80ace57]
/usr/lib/libpython2.7.so.1.0(+0x9cbcf)[0x7f2bd80abbcf]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1321)[0x7f2bd80e89e1]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f2bd80ed392]
/usr/lib/libpython2.7.so.1.0(+0xf708f)[0x7f2bd810608f]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveOneFlags+0x140)[0x7f2bd8107fb0]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveLoopFlags+0x4e)[0x7f2bd810819e]
/usr/lib/libpython2.7.so.1.0(PyRun_AnyFileExFlags+0x3e)[0x7f2bd81087fe]
/usr/lib/libpython2.7.so.1.0(Py_Main+0xc7f)[0x7f2bd8118c2f]
/usr/lib/libc.so.6(__libc_start_main+0xf5)[0x7f2bd7a68bc5]
python2[0x400741]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00600000-00601000 r--p 00000000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00601000-00602000 rw-p 00001000 08:11 1886483                            /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
012d1000-029b7000 rw-p 00000000 00:00 0                                  [heap]
7f2bced0d000-7f2bced11000 r-xp 00000000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bced11000-7f2bcef10000 ---p 00004000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef10000-7f2bcef11000 r--p 00003000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef11000-7f2bcef13000 rw-p 00004000 08:01 923895                     /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef13000-7f2bcef26000 r-xp 00000000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcef26000-7f2bcf125000 ---p 00013000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf125000-7f2bcf126000 r--p 00012000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf126000-7f2bcf127000 rw-p 00013000 08:11 57747                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf127000-7f2bcf171000 r-xp 00000000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf171000-7f2bcf370000 ---p 0004a000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf370000-7f2bcf371000 r--p 00049000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf371000-7f2bcf376000 rw-p 0004a000 08:11 57858                      /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf376000-7f2bcf377000 rw-p 00000000 00:00 0
7f2bcf377000-7f2bcf3d9000 r-xp 00000000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf3d9000-7f2bcf5d8000 ---p 00062000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5d8000-7f2bcf5dc000 r--p 00061000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5dc000-7f2bcf5e3000 rw-p 00065000 08:01 798526                     /usr/lib/libssl.so.1.0.0
7f2bcf5e3000-7f2bcf5eb000 r-xp 00000000 08:01 923889                     /usr/lib/python2.7/lib-dynload/_ssl.soAborted (core dumped)

Configuration information

Python:

Python 2.7.5

uname -a:

Linux agravier-archvm 3.10.10-1-ARCH #1 SMP PREEMPT Fri Aug 30 11:30:06 CEST 2013 x86_64 GNU/Linux

pip freeze --local:

QSTK==0.2.6
matplotlib==1.3.0
nose==1.3.0
numpy==1.7.1
pandas==0.12.0
pyparsing==2.0.1
python-dateutil==2.1
pytz==2013.7
scikit-learn==0.14.1
scipy==0.12.1
six==1.4.1
yolk==0.4.3

Concluding remarks

Note that in the line that I use to create the data list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)), the number of rows has an influence on whether Python crashes. If less than 9, there is the output:

Python 2.7.5 (default, Sep  6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 8)),
...                        columns=["A", "B", "C"],
...                        dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
>>> df_test
                     A                           B                            C
0  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
1  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
2  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
3  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
4  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
5  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
6  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)
7  2001-01-01 00:00:00  (1972-11-04 17:00:00, , 0)  (1970-01-01 20:00:00, , 20)

Now, this output doesn't make much sense to me, it doesn't seem to respect the dtype spec that I give, but it's very possible that I don't understand the dtype spec well and that it's actually perfectly sensible output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandas

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions