Skip to content

resample becomes non-deterministic, depending on DateTimeIndex values #28675

Closed
@haeusser

Description

@haeusser

Minimal Example

import datetime as dt
import numpy as np
import pandas as pd


def np_to_df(data, start_time):
    index = pd.DatetimeIndex(
        [start_time + dt.timedelta(milliseconds=t) for t in range(len(data))])
    df = pd.DataFrame(data, index=index)
    return df


# generate sample data
data = np.sin(np.arange(1000) / 30)

# create DataFrames with DateTimeIndices
df_1 = np_to_df(data, dt.datetime(2019, 9, 30, 9, 41))
df_2 = np_to_df(data, dt.datetime(2019, 9, 30, 9, 42))

# print difference before resampling
print("error_1-2:", np.mean(np.abs(df_1.values - df_2.values)))

# resample
df_1 = df_1.resample("19ms").mean()
df_2 = df_2.resample("19ms").mean()

# print difference after resampling
print("error_1-2:", np.mean(np.abs(df_1.values - df_2.values)))

Output:

error_1-2: 0.0
error_1-2: 0.04119868246404099

Problem description

When you give the exact same data to the resample function, it becomes non-deterministic if the DateTimeIndex has differing values - even though the frequency is the same.

Expected Output

The values of the two DataFrames should be exactly the same.

Output of pd.show_versions()

  • commit : None
  • python : 3.6.8.final.0
  • python-bits : 64
  • OS : Linux
  • OS-release : 4.15.0-51-generic
  • machine : x86_64
  • processor : x86_64
  • byteorder : little
  • LC_ALL : None
  • LANG : C.UTF-8
  • LOCALE : en_US.UTF-8
  • pandas : 0.25.1
  • numpy : 1.17.2
  • pytz : 2019.2
  • dateutil : 2.8.0
  • pip : 9.0.1
  • setuptools : 41.0.1
  • Cython : None
  • pytest : 4.4.0
  • hypothesis : None
  • sphinx : None
  • blosc : None
  • feather : None
  • xlsxwriter : None
  • lxml.etree : 4.3.3
  • html5lib : 0.999999999
  • pymysql : None
  • psycopg2 : 2.7.7 (dt dec pq3 ext lo64)
  • jinja2 : 2.10.1
  • IPython : 7.1.1
  • pandas_datareader: None
  • bs4 : None
  • bottleneck : None
  • fastparquet : None
  • gcsfs : None
  • lxml.etree : 4.3.3
  • matplotlib : 3.1.1
  • numexpr : None
  • odfpy : None
  • openpyxl : None
  • pandas_gbq : None
  • pyarrow : None
  • pytables : None
  • s3fs : None
  • scipy : 1.3.1
  • sqlalchemy : 1.3.7
  • tables : None
  • xarray : None
  • xlrd : None
  • xlwt : None
  • xlsxwriter : None

Happy about any help, @jreback ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions