Skip to content

Possible Bug - math on like-indexed datetime series doesn't work as expected #7500

Closed
@aullrich2013

Description

@aullrich2013

firstordernoteval
firstevalorder

I have two series that are like-indexed datetimes. I'm trying to do simple math operations on them and noticed the results don't match what I'd expect. Specifically, subtracting one datetime from the other doesn't always result in subtraction across the aligned indices. Transforming the series to a dataframe with a dummy column gets us closer but the type manipulation isn't correct.

print firstOrderNotEval.loc[site]
print firstEvalOrder.loc[site]
print type(firstOrderNotEval.loc[site])
print type(firstEvalOrder.loc[site])

### output:
#2008-08-21 00:00:00
#2013-09-10 00:00:00
# <class 'pandas.tslib.Timestamp'>
# <class 'pandas.tslib.Timestamp'>

timeToFirstNonEvalPurchase_doesntWork = ((firstOrderNotEval - firstEvalOrder)/np.timedelta64(1,'D'))
timeToFirstNonEvalPurchase = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D'))['a']

print timeToFirstNonEvalPurchase_doesntWork.loc[2898717]
print timeToFirstNonEvalPurchase.loc[2898717]

### output:
# nan
# -1846 nanoseconds # note should be 1846 days

Subtracting individual elements gives the correct result but as a datetime.timedelta type. subtracting the series directly gives NaT:

site = 2898717   
print (firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print type(firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print (firstOrderNotEval - firstEvalOrder).loc[site]
### output:
# -1846 days, 0:00:00
# <type 'datetime.timedelta'>
# NaT

Perhaps this has to do with the timestamp type itself given the following example:

print (firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')).loc[site]/np.timedelta64(1,'D')
print ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D')).loc[site]

### output:
# a   -1846
# Name: 2898717.0, dtype: float64
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]

Note that the following have different results based on how the divide by timedelta64 is performed:

tmp = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')))
print (tmp/np.timedelta64(1,'D')).loc[site]
print tmp.apply(lambda x: x/np.timedelta64(1,'D')).loc[site]
### output:
# a   -00:00:00.000002
# Name: 2898717.0, dtype: timedelta64[ns]
### what we'd expect: 
# a   -1846
# Name: 2898717.0, dtype: float64

The attached pickle files (as .jpg) include the series used in this example

firstOrderNotEval.to_pickle('./firstOrderNotEval.jpg')
firstEvalOrder.to_pickle('./firstEvalOrder.jpg')

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions