Closed
Description
I have stumbled across a curious issue when working with pandas.concat
.
Let's consider a simple example:
import pandas as pd
from copy import deepcopy
example_multiindex1 = pd.MultiIndex.from_product([['a'], ['b']])
example_dataframe1 = pd.DataFrame([0], index=example_multiindex1)
example_multiindex2 = pd.MultiIndex.from_product([['a'], ['c']])
example_dataframe2 = pd.DataFrame([1], index=example_multiindex2)
example_dict = {'s1': example_dataframe1, 's2': example_dataframe2}
print pd.concat(example_dict, names=['testname'])
The output is as expected:
0
testname
s1 a b 0
s2 a c 1
Strange thing will happen if we pass to concat
not an original dict but a deepcopy of it:
print pd.concat(deepcopy(example_dict), names=['testname'])
The output, surprisingly, will be:
0
testname
s1 a b 0
s2 a b 1
Thus meaning, that the multiindex of the first DataFrame was taken two times.
Contents of the copied dict though seems to be correct at first glance:
>>> print deepcopy(example_dict)
{'s2': 0
a c 1, 's1': 0
a b 0}
PS. Version information:
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Linux
OS-release: 3.11.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.0
nose: 1.3.6
Cython: 0.20.2
numpy: 1.9.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 3.1.0
sphinx: None
patsy: 0.2.1
dateutil: 2.4.2
pytz: 2015.2
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Issue is reproducible down to version 0.15.0. Version 0.14.1 is fine.