Skip to content

DataFrame.interpolate() extrapolates over trailing missing data #8000

Closed
@grahamjeffries

Description

@grahamjeffries

See also the discussion at StackOverflow.

Linear interpolation on a series with missing data at the end of the array will overwrite trailing missing values with the last non-missing value. In effect, the function extrapolates rather than strictly interpolating.

Example:

import pandas as pd
import numpy as np

a = pd.Series([np.nan, 1, np.nan, 3, np.nan])
a.interpolate()

Yields (note the extrapolated 4):

0   NaN
1     1
2     2
3     3
4     4
5     4
dtype: float64

not

0   NaN
1     1
2     2
3     3
4     4
5     NaN
dtype: float64

I believe the fix is something along the lines of changing lines 1545:1546 in core/common.py from

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid])

to

result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid], yvalues[firstIndex:][valid], np.nan, np.nan)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions