Description
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.23.0.dev0+657.g01882ba5b'
In [3]: df1 = pd.DataFrame({'v1' : range(12)}, index=pd.MultiIndex.from_product([list('abc'),list('xy'),[1,2]], names=['abc','xy','num']))
...: df1
...:
Out[3]:
v1
abc xy num
a x 1 0
2 1
y 1 2
2 3
b x 1 4
2 5
y 1 6
2 7
c x 1 8
2 9
y 1 10
2 11
In [4]: df2 = pd.DataFrame({'v2': [100*i for i in range(1,7)]}, index=pd.MultiIndex.from_product([list('abc'), list('xy')],names=['abc','xy']))
In [5]: df2
Out[5]:
v2
abc xy
a x 100
y 200
b x 300
y 400
c x 500
y 600
In [6]: df1.merge(df2, on=['abc','xy']) # 'num' disappears
Out[6]:
v1 v2
abc xy
a x 0 100
x 1 100
y 2 200
y 3 200
b x 4 300
x 5 300
y 6 400
y 7 400
c x 8 500
x 9 500
y 10 600
y 11 600
In [7]: df1.reset_index().merge(df2, on=['abc','xy']) # This preserves 'num'
Out[7]:
abc xy num v1 v2
0 a x 1 0 100
1 a x 2 1 100
2 a y 1 2 200
3 a y 2 3 200
4 b x 1 4 300
5 b x 2 5 300
6 b y 1 6 400
7 b y 2 7 400
8 c x 1 8 500
9 c x 2 9 500
10 c y 1 10 600
11 c y 2 11 600
In [8]: df1.merge(df2, on='xy') # 'abc' and 'num' disappear
Out[8]:
v1 v2
xy
x 0 100
x 0 300
x 0 500
x 1 100
x 1 300
x 1 500
x 4 100
x 4 300
x 4 500
x 5 100
x 5 300
x 5 500
x 8 100
x 8 300
x 8 500
x 9 100
x 9 300
x 9 500
y 2 200
y 2 400
y 2 600
y 3 200
y 3 400
y 3 600
y 6 200
y 6 400
y 6 600
y 7 200
y 7 400
y 7 600
y 10 200
y 10 400
y 10 600
y 11 200
y 11 400
y 11 600
Problem description
It seems that the new feature implemented in #17484 that allows merging on a combination of columns and index levels can drop index levels, which is really non-intuitive. In the first example, the index level named "num" gets dropped, while in the last example, both "abc" and "xy" are dropped.
If this is the desired behavior, then it needs to be carefully documented.
N.B. There is also an error in the docs of merging.rst that says this feature was introduced in v.0.22, but it will be introduced in v0.23
I'm guessing @jmmease will need to look at this.
Expected Output
In [6]: df1.merge(df2, on=['abc','xy'])
Out[6]:
v1 v2
abc xy num
a x 1 0 100
2 1 100
y 1 2 200
2 3 200
b x 1 4 300
2 5 300
y 1 6 400
2 7 400
c x 1 8 500
2 9 500
y 1 10 600
2 11 600
In [8]: df1.merge(df2, on='xy')
Out[8]:
abc_x num v1 abc_y v2
xy
x a 1 0 a 100
x a 1 0 b 300
x a 1 0 c 500
x a 2 1 a 100
x a 2 1 b 300
x a 2 1 c 500
x b 1 4 a 100
x b 1 4 b 300
x b 1 4 c 500
x b 2 5 a 100
x b 2 5 b 300
x b 2 5 c 500
x c 1 8 a 100
x c 1 8 b 300
x c 1 8 c 500
x c 2 9 a 100
x c 2 9 b 300
x c 2 9 c 500
y a 1 2 a 200
y a 1 2 b 400
y a 1 2 c 600
y a 2 3 a 200
y a 2 3 b 400
y a 2 3 c 600
y b 1 6 a 200
y b 1 6 b 400
y b 1 6 c 600
y b 2 7 a 200
y b 2 7 b 400
y b 2 7 c 600
y c 1 10 a 200
y c 1 10 b 400
y c 1 10 c 600
y c 2 11 a 200
y c 2 11 b 400
y c 2 11 c 600
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0.dev0+657.g01882ba5b
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None