Skip to content

API: resample with PeriodIndex default span (start/end convention) #7744

Open
@jorisvandenbossche

Description

@jorisvandenbossche

Previously, for resampling with PeriodIndex, you had two conventions: 'start' (start -> start) and 'end' (end -> end). This would give something like this (note: the following is not current real code output, but from Wes' book):

In [25]: s = pd.Series(np.arange(2), index=pd.period_range('2000-1', periods=2, freq='A'))

In [26]: s
Out[26]:
2000    0
2001    1
Freq: A-DEC, dtype: int32

In [27]: s.resample('Q-DEC', fill_method='ffill', convention='start')
Out[27]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
Freq: Q-DEC, dtype: int32

In [28]: s.resample('Q-DEC', fill_method='ffill', convention='end')
Out[27]:
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

Following Wes' book, the default argument was 'end'. However, the current behaviour is like this (this is real output):

In [27]: s.resample('Q-DEC', fill_method='ffill')
Out[27]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

So in fact this is a third option 'span' (start -> end). This option is mentioned in #1635, but from the issue it seems it was never implemented (the commit was never merged. There was a test added in comments at that time, but this is still in comments: https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_resample.py#L1134).
In practice, however, this is the case (the default behaviour is this mentioned 'span' behaviour). But also the option 'start' has changed:

In [28]: s.resample('Q-DEC', fill_method='ffill', convention='start')
Out[28]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

This gives the same as the default (only for 'end' it is the same as before).

Some issues/questions:

  • what is the default value for convention? It is nowhere in the docs, and also not in the docstring (apart from the signature, which says 'start').
  • I don't find the issue/PR/release note where it says that the default for period resample (upsampling) has changed
  • the default now is a 'spanning' behaviour, but this is the same as 'start'. Shouldn't be this something else? So that the 'start' option has another behaviour (start -> start) than the default spanning behaviour ('start' -> 'end')?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions