Closed
Description
xref #6549
When a dataframe with a timedelta64 is very large, the mean() function does not work as expected. In the following code, the mean is incorrect but all the other stats are fine:
import numpy as np
import pandas as pd
dates1 = pd.date_range(start='20000101T000000', end='20150101T230000', freq='1H').values
dates2 = dates1 + np.random.randint(0*60*60*1000000000, 10*24*60*60*1000000000, len(dates1))
df = pd.DataFrame({'dates1':dates1, 'dates2':dates2})
df['tdiff'] = dates2-dates1
df['fdiff'] = df['tdiff'].apply(lambda x: float(x.item())/1000000000/3600/24)
df.describe()
By making the length smaller, by changing the start date in the above example:
dates1 = pd.date_range(start='20140101T000000', end='20150101T230000', freq='1H').values
The correct result is obtained. (The mean of the timedelta should be about 5 days.) Is this an open bug?
Version:
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 13.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: C
LANG: en_US.UTF-8
pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.2
numpy: 1.9.1
scipy: 0.15.1
statsmodels: None
IPython: 2.3.1
sphinx: 1.2.3
patsy: None
dateutil: 2.4.0
pytz: 2014.10
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None