Closed
Description
Comparison of tz-aware timestamps fails across DST boundaries. The comment in tslib.pyx:3845
# TODO: this assumed sortedness :/
perhaps implies this is a known problem that was never resolved, so apologies if a new issue is not appropriate.
Self contained example
import pandas as pd
import numpy as np
# Manifesting issue:
ts_listA = ['2008-05-12 09:50:00-04:00',
'2008-05-12 09:50:18-04:00',
'2008-05-12 09:50:33-04:00',
'2008-05-12 09:50:33-04:00',
'2008-12-12 09:50:35-05:00',
'2008-05-12 09:50:32-04:00',
'2008-05-12 09:49:15-04:00',
'2008-05-12 09:50:49-04:00',
'2008-05-12 09:50:54-04:00']
ts_listB = ['2008-05-12 09:50:34-04:00',
'2008-05-12 09:50:35-04:00',
'2008-05-12 09:50:36-04:00',
'2008-05-12 09:50:40-04:00',
'2008-05-12 09:50:42-04:00',
'2008-05-12 09:50:43-04:00',
'2008-05-12 09:50:55-04:00',
'2008-05-12 09:50:55-04:00',
'2008-05-12 09:50:57-04:00']
df = pd.DataFrame({'dtA' : pd.to_datetime(ts_listA).tz_localize('US/Eastern'),
'dtB' : pd.to_datetime(ts_listB).tz_localize('US/Eastern')})
print (df.dtA - df.dtB) / pd.Timedelta('1 minutes')
# Underlying tslib.tz_convert call - similar problem
asi8 = np.array([1210600200000000000,
1210600218000000000,
1210600233000000000,
1210600233000000000,
1229093435000000000,
1210600232000000000,
1210600155000000000,
1210600249000000000,
1210600254000000000])
tz = pd.Timestamp('2008-05-12 12:00:00').tz_localize('US/Eastern').tz
result = pd.tslib.tz_convert(asi8, 'UTC', tz)
print pd.to_datetime(asi8)
print pd.to_datetime(result)
Output
Expected output:
0 -0.566667
1 -0.283333
2 -0.050000
3 -0.116667
4 308219.883333
5 -0.183333
6 -1.666667
7 -0.100000
8 -0.050000
Name: dtA, dtype: float64
Actual output of timedelta computation:
0 -0.566667
1 -0.283333
2 -0.050000
3 -0.116667
4 308219.883333
5 -60.183333
6 -61.666667
7 -60.100000
8 -60.050000
Name: dtA, dtype: float64
Computed timedelta for rows after the 2008-12-12 date are off by an hour.
Output of tz_convert:
DatetimeIndex(['2008-05-12 13:50:00', '2008-05-12 13:50:18',
'2008-05-12 13:50:33', '2008-05-12 13:50:33',
'2008-12-12 14:50:35', '2008-05-12 13:50:32',
'2008-05-12 13:49:15', '2008-05-12 13:50:49',
'2008-05-12 13:50:54'],
dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2008-05-12 09:50:00', '2008-05-12 09:50:18',
'2008-05-12 09:50:33', '2008-05-12 09:50:33',
'2008-12-12 09:50:35', '2008-05-12 08:50:32',
'2008-05-12 08:49:15', '2008-05-12 08:50:49',
'2008-05-12 08:50:54'],
dtype='datetime64[ns]', freq=None)
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-504.1.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: None
pip: 8.1.1
setuptools: 20.3
Cython: None
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
Jinja2: None