Description
From 2021-06-09 dev call, new discussion of #22864
Problem: Frequency string 'D'
and pd.offsets.Day
is defined to be a fixed 24 hour period since it's a subclass of Tick
.
In the context of timezones with a DST crossing, 'D'
acts as a calendar day (23/24/25H) instead for the following operations:
pd.date_range(start, end, freq="D")
df.resample("D")...
Original Settled Solution: Deprecate all behavior where Frequency string 'D'
and pd.offsets.Day
is a fixed 24 hours in favor of "24H"
. A private _Day
offset would be used where appropriate internally and swapped out once the deprecation is enforced.
(Note: I lost steam last time catching all the warnings issued in the testing suite given the above solution touches datetimes, timedeltas, offsets, methods, etc)
Other Solutions
- Implement a new offset, e.g.
"'DayDST'"/pd.offsets.CalendarDay"
, that users can migrate to. - Deprecate
Tick
s (Day
is a subclass) all together since they are redundant withTimedelta
s
cc @pandas-dev/pandas-core
(@jbrockmendel) Updating with checkboxes to keep track of issues I think this would resolve: