Skip to content

BUG: DatetimeIndex.shift(freq=...) raises near DST boundary #8616

Closed
@ischwabacher

Description

@ischwabacher

xref #5694 #8531 (?)
xref #8817

This is presumably caused by the fact that pytz time zones internalize the offset of the current time.

In [1]: import pandas as pd

In [2]: idx = pd.date_range('2013-11-03', tz='America/Chicago',
   ...:                     periods=6, freq='H')

In [3]: pd.Series(index=idx)
Out[3]: 
2013-11-03 00:00:00-05:00   NaN
2013-11-03 01:00:00-05:00   NaN
2013-11-03 01:00:00-06:00   NaN
2013-11-03 02:00:00-06:00   NaN
2013-11-03 03:00:00-06:00   NaN
2013-11-03 04:00:00-06:00   NaN
Freq: H, dtype: float64

In [4]: pd.Series(index=idx).shift(freq='H')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-19ee418c9aa1> in <module>()
----> 1 pd.Series(index=idx).shift(freq='H')

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in shift(self, periods, freq, axis, **kwds)
   3290             new_data = self._data.shift(periods=periods, axis=block_axis)
   3291         else:
-> 3292             return self.tshift(periods, freq, **kwds)
   3293 
   3294         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.pyc in tshift(self, periods, freq, axis, **kwds)
   3386         else:
   3387             new_data = self._data.copy()
-> 3388             new_data.axes[block_axis] = index.shift(periods, offset)
   3389 
   3390         return self._constructor(new_data).__finalize__(self)

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in shift(self, n, freq)
    855         end = self[-1] + n * self.offset
    856         return DatetimeIndex(start=start, end=end, freq=self.offset,
--> 857                              name=self.name, tz=self.tz)
    858 
    859     def repeat(self, repeats, axis=None):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, **kwds)
    204             return cls._generate(start, end, periods, name, freq,
    205                                  tz=tz, normalize=normalize, closed=closed,
--> 206                                  infer_dst=infer_dst)
    207 
    208         if not isinstance(data, (np.ndarray, ABCSeries)):

/Users/afni/homebrew/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.pyc in _generate(cls, start, end, periods, name, offset, tz, normalize, infer_dst, closed)
    369         if tz is not None and inferred_tz is not None:
    370             if not inferred_tz == tz:
--> 371                 raise AssertionError("Inferred time zone not equal to passed "
    372                                      "time zone")
    373 

AssertionError: Inferred time zone not equal to passed time zone

Playing with dateutil and pytz timezones makes me despair of repairing that assertion to be anything sane. Is it actually needed, or can we just turn it into a warning?

Observations:

  • two pytz.DstTzInfos of the same zoneinfo zone that have been .normalize()ed to different times may compare different
  • a pytz.*TzInfo has a zone member containing the name of the zoneinfo zone from which it was constructed
  • a dateutil.tzfile has no public members beyond the tzinfo API, which is insufficient to tell whether two time zones are equal
  • a pytz time zone and a dateutil time zone appear to compare different under all circumstances, regardless of whether they represent the same time zone
  • the various dateutil time zone classes have no common base class within the dateutil package
  • pytz zones constructed from different names for the same zoneinfo zone (e.g. UTC and Etc/UTC) compare different

Ugh.

from #8817

import pandas as pd
import pytz
import datetime

dt = datetime.datetime(2014, 11, 14, 0)
dt_est = pytz.timezone('EST').localize(dt)
s = pd.Series(data=[1], index=[dt_est])

s.shift(0, freq='h')  # 2014-11-14 00:00:00-05:00 (seems okay) 
s.shift(-1, freq='h')  # 2014-11-13 18:00:00-05:00 (expected 2014-11-13 23:00:00)
s.shift(1, freq='h')  # 2014-11-13 20:00:00-05:00 (expected 2014-11-14 01:00:00)

s.shift(-1, freq='s')  # 2014-11-13 18:59:59-05:00 (same with other freq)

Metadata

Metadata

Assignees

No one assigned

    Labels

    TimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions