Description
Code Sample, a copy-pastable example if possible
http://stackoverflow.com/questions/32750970/python-pandas-merge-causing-memory-overflow
# coding: utf-8
import pandas as pd
data = pd.read_csv('https://gist.githubusercontent.com/xgdgsc/8671a22136e1da937f1046a5f211c0ff/raw/d261706a6e7d1d7014e45e47122ead71e7159ef4/small.csv', index_col='<Date>')
print(data.shape)
another = data[[ ' <Open>']]
joined = data.join([another])
print(joined.shape)
Problem description
Currently having index with duplicate keys when joining dataframes would cause severe memory overflow, sometimes freezes the computer and user has to hard reboot, which can be disastrous for unsaved work.
Expected Output
Adding a simple checking before joining/merging , stop the operation and warn the user would be enough.
Output of pd.show_versions()
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None