Closed
Description
Minimal Example
import datetime as dt
import numpy as np
import pandas as pd
def np_to_df(data, start_time):
index = pd.DatetimeIndex(
[start_time + dt.timedelta(milliseconds=t) for t in range(len(data))])
df = pd.DataFrame(data, index=index)
return df
# generate sample data
data = np.sin(np.arange(1000) / 30)
# create DataFrames with DateTimeIndices
df_1 = np_to_df(data, dt.datetime(2019, 9, 30, 9, 41))
df_2 = np_to_df(data, dt.datetime(2019, 9, 30, 9, 42))
# print difference before resampling
print("error_1-2:", np.mean(np.abs(df_1.values - df_2.values)))
# resample
df_1 = df_1.resample("19ms").mean()
df_2 = df_2.resample("19ms").mean()
# print difference after resampling
print("error_1-2:", np.mean(np.abs(df_1.values - df_2.values)))
Output:
error_1-2: 0.0
error_1-2: 0.04119868246404099
Problem description
When you give the exact same data to the resample function, it becomes non-deterministic if the DateTimeIndex
has differing values - even though the frequency is the same.
Expected Output
The values of the two DataFrames
should be exactly the same.
Output of pd.show_versions()
- commit : None
- python : 3.6.8.final.0
- python-bits : 64
- OS : Linux
- OS-release : 4.15.0-51-generic
- machine : x86_64
- processor : x86_64
- byteorder : little
- LC_ALL : None
- LANG : C.UTF-8
- LOCALE : en_US.UTF-8
- pandas : 0.25.1
- numpy : 1.17.2
- pytz : 2019.2
- dateutil : 2.8.0
- pip : 9.0.1
- setuptools : 41.0.1
- Cython : None
- pytest : 4.4.0
- hypothesis : None
- sphinx : None
- blosc : None
- feather : None
- xlsxwriter : None
- lxml.etree : 4.3.3
- html5lib : 0.999999999
- pymysql : None
- psycopg2 : 2.7.7 (dt dec pq3 ext lo64)
- jinja2 : 2.10.1
- IPython : 7.1.1
- pandas_datareader: None
- bs4 : None
- bottleneck : None
- fastparquet : None
- gcsfs : None
- lxml.etree : 4.3.3
- matplotlib : 3.1.1
- numexpr : None
- odfpy : None
- openpyxl : None
- pandas_gbq : None
- pyarrow : None
- pytables : None
- s3fs : None
- scipy : 1.3.1
- sqlalchemy : 1.3.7
- tables : None
- xarray : None
- xlrd : None
- xlwt : None
- xlsxwriter : None
Happy about any help, @jreback ?