Skip to content

Resampling uses inconsistent labeling for sub-daily and super-daily frequencies #9586

Closed
@shoyer

Description

@shoyer

xref #2665
xref #5440

Resample appears to be use an inconsistent label convention depending on whether the target frequency is sub-daily/daily or super-daily:

  • For sub-daily/daily frequencies, label='left' makes labels at the timestamp corresponding to the start of each frequency bin, and label='right' that makes labels at that timestamp plus the frequency (at the timestamp dividing exactly dividing bins).
  • For super-daily frequencies, both labels appears to shifted minus one day to the left, so the timestamps no longer cleanly divide the frequencies. Moreover, the default label shifts from 'left' to 'right'! My guess is that the default was changed here because users were confused by label='left' no longer falling inside the expected interval. (I guess I could check git blame for the details.)

I found this behavior quite surprising and confusing. Is it intentional? I would like to rationalize this if possible, because this strikes me as very poor design. The behavior also couples in a weird way with the closed argument (see the linked issues).

From my perspective (as someone who uses monthly and yearly data), the sub-daily/daily behavior makes sense and the super-daily behavior is a bug: there's no particular reason why it makes sense to use 1 day as an offset for frequencies with super-daily resolution.

CC @Cd48 @kdebrab


Here's my test script:

for orig_freq, target_freq in [('20s', '1min'), ('20min', '1H'), ('10H', '1D'),
                               ('3D', '10D'), ('10D', '1M'), ('1M', 'Q'), ('3M', 'A')]:
    print '%s -> %s:' % (orig_freq, target_freq)
    ind = pd.date_range('2000-01-01', freq=orig_freq, periods=10)
    s = pd.Series(np.arange(10), ind)
    print 'default', s.resample(target_freq, how='first').index[0]
    print 'left', s.resample(target_freq, label='left', how='first').index[0]
    print 'right', s.resample(target_freq, label='right', how='first').index[0]
20s -> 1min:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 00:01:00
20min -> 1H:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-01 01:00:00
10H -> 1D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-02 00:00:00
3D -> 10D:
default 2000-01-01 00:00:00
left 2000-01-01 00:00:00
right 2000-01-11 00:00:00
10D -> 1M:
default 2000-01-31 00:00:00
left 1999-12-31 00:00:00
right 2000-01-31 00:00:00
1M -> Q:
default 2000-03-31 00:00:00
left 1999-12-31 00:00:00
right 2000-03-31 00:00:00
3M -> A:
default 2000-12-31 00:00:00
left 1999-12-31 00:00:00
right 2000-12-31 00:00:00

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions