Skip to content

Breaking examples due to resample refactor #12448

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

While using master a bit, I discovered some more cases where the new resample API breaks things:

  • Plotting. .plot is a dedicated groupby/resample method (which adds each group individually to the plot), while I think it is a very common idiom to quickly resample your timeseries and plot it with (old API) eg s.resample('D').plot().
    Example with master:

    In [1]: s = pd.Series(np.random.randn(60), index=date_range('2016-01-01', periods=60, freq='1min'))
    
    In [3]: s.resample('15min').plot()
    Out[3]:
    2016-01-01 00:00:00    Axes(0.125,0.1;0.775x0.8)
    2016-01-01 00:15:00    Axes(0.125,0.1;0.775x0.8)
    2016-01-01 00:30:00    Axes(0.125,0.1;0.775x0.8)
    2016-01-01 00:45:00    Axes(0.125,0.1;0.775x0.8)
    Freq: 15T, dtype: object
    

    figure_1

    while previously it would just have given you one continuous line.
    This one can be solved I think by special casing plot for resample (not have it a special groupby-like method, but let it warn and pass the the resample().mean() result to Series.plot() like the 'deprecated_valids')

  • When you previously called a method on the resample result that is also a valid Resampler method now. Eg s.resample(freq).min() would previously have given you the "minimum daily average" while now it will give you the "minimum per day".
    This one is more difficult/impossible to solve I think? As you could detect that case if you know it is old code, but cannot distinguish it from perfectly valid code with the new API. If we can't solve it, I think it deserves some mention in the whatsnew explanation.

  • Using resample on a groupby object (xref Resampling converts int to float, but only in group by #12202). Using the example of that issue, with 0.17.1 you get:

    In [1]: df = pd.DataFrame({'date': pd.date_range(start='2016-01-01', periods=4,
    freq='W'),
    ...:                'group': [1, 1, 2, 2],
    ...:                'val': [5, 6, 7, 8]})
    
    In [2]: df.set_index('date', inplace=True)
    
    In [3]: df
    Out[3]:
          group  val
    date
    2016-01-03      1    5
    2016-01-10      1    6
    2016-01-17      2    7
    2016-01-24      2    8
    
    In [4]: df.groupby('group').resample('1D', fill_method='ffill')
    Out[4]:
                    val
    group date
    1     2016-01-03    5
      2016-01-04    5
      2016-01-05    5
      2016-01-06    5
      2016-01-07    5
      2016-01-08    5
      2016-01-09    5
      2016-01-10    6
    2     2016-01-17    7
      2016-01-18    7
      2016-01-19    7
      2016-01-20    7
      2016-01-21    7
      2016-01-22    7
      2016-01-23    7
      2016-01-24    8
    
    In [5]: pd.__version__
    Out[5]: u'0.17.1'
    

    while with master you get:

    In [29]: df.groupby('group').resample('1D', fill_method='ffill')
    Out[29]: <pandas.core.groupby.DataFrameGroupBy object at 0x0000000009BA73C8>
    

    which will give you different results/error with further operations on that. Also, this case does not raise any FutureWarning (which should, as the user should adapt the code to groupby().resample('D').ffill())

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions