Skip to content

Numpy dtype weirdness in TimedeltaIndex._simple_new #9462

Closed
@ischwabacher

Description

@ischwabacher

I was poking around in tdi.py with pdb and I noticed this code at line 255 of current master:

    @classmethod
    def _simple_new(cls, values, name=None, freq=None, **kwargs):
-->     if not getattr(values,'dtype',None):   # This doesn't do what it looks like it does
            values = np.array(values,copy=False)
        if values.dtype == np.object_:
            values = tslib.array_to_timedelta64(values)
        if values.dtype != _TD_DTYPE:
            values = com._ensure_int64(values).view(_TD_DTYPE)

        result = object.__new__(cls)
        result._data = values
        result.name = name
        result.freq = freq
        result._reset_identity()
        return result

This looks to me like it's testing whether values is already an ndarray, and if it isn't, it's taking a view into its values. But that's not what it's doing at all. If values is a Series, it has a dtype, so the view-making code should be skipped! Except that it isn't:

In [1]: import numpy as np

In [2]: map(np.dtype, [None, 'i8', 'i4,i4', ('m8[ns]', (2, 2)), ('S', 4),
   ...:                [('name', 'u1', (2, 2))],
   ...:                {'names': ['name'], 'formats': ['f8']},
   ...:                ('i4,i4', (2, 2))])
Out[2]: 
[dtype('float64'),                                # note that dtype(None) is float!
 dtype('int64'),                                  # fixed dtype
 dtype([('f0', '<i4'), ('f1', '<i4')]),           # record dtype
 dtype(('<m8[ns]', (2, 2))),                      # array dtype
 dtype('S4'),                                     # flexible dtype
 dtype([('name', 'u1', (2, 2))]),                 # record dtype with array fields
 dtype([('name', '<f8')]),                        # another record dtype
 dtype(([('f0', '<i4'), ('f1', '<i4')], (2, 2)))] # array dtype with record fields

In [3]: map(bool, _)
Out[3]: [False, False, True, False, False, True, True, False]   # madness

My tentative guess is that if the outermost layer of the dtype is a record type, then it is Truey, and otherwise it's Falsey. This behavior seems too pathological to depend on.

Unfortunately, that's exactly what's happening when constructing a TimedeltaIndex from a Series of Timedeltas:

In [1]: import pandas as pd

In [2]: import pdb

In [3]: pd.Series([pd.Timedelta(1, 'h')])
Out[3]: 
0   01:00:00
dtype: timedelta64[ns]

In [4]: pdb.run('pd.Index(_)')
> <string>(1)<module>()
(Pdb) b pd.TimedeltaIndex._simple_new
Breakpoint 1 at /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py:253
(Pdb) c
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(255)_simple_new()
-> if not getattr(values,'dtype',None):
(Pdb) p values
0   01:00:00
dtype: timedelta64[ns]   # it's still a Series
(Pdb) n
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(256)_simple_new()
-> values = np.array(values,copy=False)
(Pdb) n
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(257)_simple_new()
-> if values.dtype == np.object_:
(Pdb) p values
array([3600000000000], dtype='timedelta64[ns]')   # now it's an ndarray

Everything seems to work, but what's actually going on is so different from what I would expect to be going on that it seems like a time bomb.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsInternalsRelated to non-user accessible pandas implementationTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions