Skip to content

BUG: nonexistent Timestamp pre-summer/winter DST change with dateutil timezone #31043

Closed
@AlexKirko

Description

@AlexKirko

Code Sample, a copy-pastable example if possible

This is fine:

>>> pd.__version__
'0.26.0.dev0+1790.gdd94e0db9'
>>> epoch =  1552211999999999871
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999871-0800', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999871
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999871-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552211999999999871

This is also fine:

>>> epoch =  1552212000000000000
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 03:00:00-0700', tz='dateutil/US/Pacific')
>>>
>>> t.value
1552212000000000000
>>> pd.Timestamp(t)
Timestamp('2019-03-10 03:00:00-0700', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552212000000000000

Meanwhile, this breaks representation and gets us nonexistent times:

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999872-0700', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999872
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999872

And right on the cusp, the value breaks too:

>>> epoch =  1552211999999999999
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999999-0700', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999999
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999999-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999999

Problem description

When we use dateutil timezones and try to create a Timestamp that is right on the cusp of the change from winter to summer time, we can get nonexistent times (the clock is supposed to jump from 2 A.M. to 3 A.M. and yet we get 2:59:59).

I've investigated this, and it appears that at 128 nanoseconds before the clock jump, DST offset and utcoffset in dateutil change, so we end up in a situation when the offsets are what they are supposed to be after the jump, but the time hasn't jumped yet, so the constructor returns a nonexistent time. Calling the constructor again moves the clock 1 hour back.

This can be checked out with:

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t.tz.dst(t)
datetime.timedelta(seconds=3600)

My assumption is that when we need to determine UTC offset, rounding happens at some point, and we round to epoch=1552212000000000000, get offset, and then use it on time pre-clock jump.

I'd like to try to fix this one.

Expected Output

>>> epoch =  1552211999999999872
>>> t = pd.Timestamp(epoch, tz='dateutil/US/Pacific')
>>> t
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> t.value
1552211999999999872
>>> pd.Timestamp(t)
Timestamp('2019-03-10 01:59:59.999999872-0800', tz='dateutil/US/Pacific')
>>> pd.Timestamp(t).value
1552208399999999872

Notes

This was thought to be part of #24329 but turned to be a separate bug as I worked on closing that issue in PR #30995.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : dd94e0d
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : ru_RU.UTF-8
LOCALE : None.None

pandas : 0.26.0.dev0+1790.gdd94e0db9
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 5.1.5
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
s3fs : 0.4.0
scipy : 1.3.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions