Open
Description
Was trying to upsample a DataFrame by non-integer amount, then compare the two. When trying to plot the second DF, pandas tries to allocate a lot of memory, and finally throws MemoryError after a few seconds.
Minimal working example
import pandas as pd
# Create data for 288 seconds
index = pd.date_range(start='2015-07-13 12:18:47', freq='S', periods=288)
df = pd.DataFrame(range(288), index=index)
# Upsample to 500 samples
td = (df.index[-1] - df.index[0])/499
# Pandas does not allow interpolation when upsampling, so resort to bfill :(
df2 = df.resample(td, fill_method='bfill')
# Let's compare them!
ax = df.plot()
df2.plot(ax=ax) # BOOM! Allocates too much memory and crashes
Using pandas 0.16.2 in WinPython 2.7.10.1 x64. pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
(I should also note that upsamping to arbitrary time index seems hopeless in pandas, as it does not allow interpolation between values near the sampling point, but that's a different issue).