Skip to content

Incorrect resampling due to DST #5694

Closed
@dalbani

Description

@dalbani

related #5172

Given this DataFrame, with an index containing the moment when DST changes (October 27th in the case of the "Europe/Paris" timezone):

index = pandas.date_range('2013-09-30', '2013-11-02', freq = '30Min', tz = 'UTC').tz_convert('Europe/Paris')
column_a = pandas.np.random.random(index.size)
column_b = pandas.np.random.random(index.size)
df = pandas.DataFrame({ "a": column_a, "b": column_b }, index = index)

Let's say I want to find the "min" and "max" values for each month:

df.resample("MS", how = { "a": "min", "b": "max" })

Here's the incorrect result:

                                  a         b
2013-09-01 00:00:00+02:00  0.015856  0.979541
2013-10-01 00:00:00+02:00  0.002039  0.999960
2013-10-31 23:00:00+01:00       NaN       NaN

Same problem with a "W-MON" frequency:

                                  a         b
2013-09-30 00:00:00+02:00  0.015856  0.979541
2013-10-07 00:00:00+02:00  0.007961  0.999734
2013-10-14 00:00:00+02:00  0.002614  0.993354
2013-10-21 00:00:00+02:00  0.005655  0.999960
2013-10-27 23:00:00+01:00       NaN       NaN
2013-11-03 23:00:00+01:00       NaN       NaN

Whereas it works fine with a "D" frequency.

                                  a         b
...
2013-10-26 00:00:00+02:00  0.004645  0.983281
2013-10-27 00:00:00+02:00  0.030151  0.986827
2013-10-28 00:00:00+01:00  0.015891  0.981455
2013-10-29 00:00:00+01:00  0.024176  0.999306
...

Should I resample only the "a" column, it also works fine:

df["a"].resample("MS", how = "min")
2013-09-01 00:00:00+02:00    0.015856
2013-10-01 00:00:00+02:00    0.002039
2013-11-01 00:00:00+01:00    0.000747
Freq: MS, dtype: float64

Tested with latest pandas from GIT master.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions