Description
I was poking around in tdi.py with pdb and I noticed this code at line 255 of current master:
@classmethod
def _simple_new(cls, values, name=None, freq=None, **kwargs):
--> if not getattr(values,'dtype',None): # This doesn't do what it looks like it does
values = np.array(values,copy=False)
if values.dtype == np.object_:
values = tslib.array_to_timedelta64(values)
if values.dtype != _TD_DTYPE:
values = com._ensure_int64(values).view(_TD_DTYPE)
result = object.__new__(cls)
result._data = values
result.name = name
result.freq = freq
result._reset_identity()
return result
This looks to me like it's testing whether values
is already an ndarray
, and if it isn't, it's taking a view into its values. But that's not what it's doing at all. If values
is a Series
, it has a dtype, so the view-making code should be skipped! Except that it isn't:
In [1]: import numpy as np
In [2]: map(np.dtype, [None, 'i8', 'i4,i4', ('m8[ns]', (2, 2)), ('S', 4),
...: [('name', 'u1', (2, 2))],
...: {'names': ['name'], 'formats': ['f8']},
...: ('i4,i4', (2, 2))])
Out[2]:
[dtype('float64'), # note that dtype(None) is float!
dtype('int64'), # fixed dtype
dtype([('f0', '<i4'), ('f1', '<i4')]), # record dtype
dtype(('<m8[ns]', (2, 2))), # array dtype
dtype('S4'), # flexible dtype
dtype([('name', 'u1', (2, 2))]), # record dtype with array fields
dtype([('name', '<f8')]), # another record dtype
dtype(([('f0', '<i4'), ('f1', '<i4')], (2, 2)))] # array dtype with record fields
In [3]: map(bool, _)
Out[3]: [False, False, True, False, False, True, True, False] # madness
My tentative guess is that if the outermost layer of the dtype is a record type, then it is True
y, and otherwise it's False
y. This behavior seems too pathological to depend on.
Unfortunately, that's exactly what's happening when constructing a TimedeltaIndex
from a Series
of Timedelta
s:
In [1]: import pandas as pd
In [2]: import pdb
In [3]: pd.Series([pd.Timedelta(1, 'h')])
Out[3]:
0 01:00:00
dtype: timedelta64[ns]
In [4]: pdb.run('pd.Index(_)')
> <string>(1)<module>()
(Pdb) b pd.TimedeltaIndex._simple_new
Breakpoint 1 at /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py:253
(Pdb) c
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(255)_simple_new()
-> if not getattr(values,'dtype',None):
(Pdb) p values
0 01:00:00
dtype: timedelta64[ns] # it's still a Series
(Pdb) n
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(256)_simple_new()
-> values = np.array(values,copy=False)
(Pdb) n
> /Users/afni/ActivityMonitorProcessing/sojourns/source/lib/python2.7/site-packages/pandas/tseries/tdi.py(257)_simple_new()
-> if values.dtype == np.object_:
(Pdb) p values
array([3600000000000], dtype='timedelta64[ns]') # now it's an ndarray
Everything seems to work, but what's actually going on is so different from what I would expect to be going on that it seems like a time bomb.