Description
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.22.0'
In [3]: s1 = pd.Series(range(8),
...: index=pd.MultiIndex.from_product([list('ab'),
...: list('xy'),
...: [1,2]],
...: names=['ab','xy','num'])
...: )
...:
In [4]: s1
Out[4]:
ab xy num
a x 1 0
2 1
y 1 2
2 3
b x 1 4
2 5
y 1 6
2 7
dtype: int64
In [5]:
In [5]: s2 = pd.Series([100*(i+1) for i in range(4)],
...: index=pd.MultiIndex.from_product([list('ab'),
...: list('xy')],
...: names=['ab','xy']))
...:
In [6]: s2
Out[6]:
ab xy
a x 100
y 200
b x 300
y 400
dtype: int64
In [7]: s1.loc[pd.IndexSlice[:,:,1]] = -1 # This works as expected
In [8]: s1
Out[8]:
ab xy num
a x 1 -1
2 1
y 1 -1
2 3
b x 1 -1
2 5
y 1 -1
2 7
dtype: int64
In [9]: s3 = s1.loc[pd.IndexSlice[:,:,1]] + s2 # This works as expected
In [10]: s3
Out[10]:
ab xy
a x 99
y 199
b x 299
y 399
dtype: int64
In [11]: s1.loc[pd.IndexSlice[:,:,1]] = s2 # This works differently in v0.22.0 and v0.23 (dev)
In [12]: s1
Out[12]:
ab xy num
a x 1 NaN
2 1.0
y 1 NaN
2 3.0
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
In [13]: s1.loc[pd.IndexSlice['a',:,:]] = -2 # This works as expected
In [14]: s1
Out[14]:
ab xy num
a x 1 -2.0
2 -2.0
y 1 -2.0
2 -2.0
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
In [15]: s4 = pd.Series([1000*i for i in range(1,5)], index=pd.MultiIndex.from_pr
...: oduct([list('xy'),[1,2]], names=['xy','num']))
...:
In [16]: s4
Out[16]:
xy num
x 1 1000
2 2000
y 1 3000
2 4000
dtype: int64
In [17]: s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This fails
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-17-cc0933ef6ebd> in <module>()
----> 1 s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This fails
C:\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(left, right, name, na_op)
716 return NotImplemented
717
--> 718 left, right = _align_method_SERIES(left, right)
719
720 converted = _Op.get_op(left, right, name, na_op)
C:\Anaconda3\lib\site-packages\pandas\core\ops.py in _align_method_SERIES(left, right, align_asobject)
645 right = right.astype(object)
646
--> 647 left, right = left.align(right, copy=False)
648
649 return left, right
C:\Anaconda3\lib\site-packages\pandas\core\series.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
2605 fill_value=fill_value, method=method,
2606 limit=limit, fill_axis=fill_axis,
-> 2607 broadcast_axis=broadcast_axis)
2608
2609 def rename(self, index=None, **kwargs):
C:\Anaconda3\lib\site-packages\pandas\core\generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
5728 copy=copy, fill_value=fill_value,
5729 method=method, limit=limit,
-> 5730 fill_axis=fill_axis)
5731 else: # pragma: no cover
5732 raise TypeError('unsupported type: %s' % type(other))
C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
5797 join_index, lidx, ridx = self.index.join(other.index, how=join,
5798 level=level,
-> 5799 return_indexers=True)
5800
5801 left = self._reindex_indexer(join_index, lidx, copy)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in join(self, other, how, level, return_indexers, sort)
3101 else:
3102 return self._join_multi(other, how=how,
-> 3103 return_indexers=return_indexers)
3104
3105 # join on the level
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _join_multi(self, other, how, return_indexers)
3199 "overlapping names")
3200 if len(overlap) > 1:
-> 3201 raise NotImplementedError("merging with more than one level "
3202 "overlap on a multi-index is not "
3203 "implemented")
NotImplementedError: merging with more than one level overlap on a multi-index is not implemented
In [18]: s1.loc[pd.IndexSlice['a',:,:]] = s4 # This puts in NaN rather than s4 values
In [19]: s1
Out[19]:
ab xy num
a x 1 NaN
2 NaN
y 1 NaN
2 NaN
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
Problem description
This is a bit related to #10440 . In the above code, if we slice where we fix the value of the third level, then we can change the slice to a constant. We can also add that slice to a Series that has an Index that matches the first 2 levels.
In v0.22.0 of pandas, the result of the lines
s1.loc[pd.IndexSlice[:,:,1]] = s2
s1
is (as shown above)
ab xy num
a x 1 NaN
2 1.0
y 1 NaN
2 3.0
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
But in the development version 0.23 of pandas, the "correct" result is given:
ab xy num
a x 1 100
2 1
y 1 200
2 3
b x 1 300
2 5
y 1 400
2 7
dtype: int64
So I then would expect that the last 2 examples, using s4
, would work in v0.23 development, because the only difference is that I am fixing the value of the first level in the slice, as opposed to the last level of the slice. But in both of those cases, I get this error (independent of the pandas version):
NotImplementedError: merging with more than one level overlap on a multi-index is not implemented
So there is a bit of an inconsistency in that a slice that fixes the last level allows the addition and assignment operations to work (and it is better with v0.23 development version than in v0.22 because the NaN
values go away), but a slice that fixes the first level does not allow the operations to work.
I'm not sure if this is a bug, or by design, or if the documentation needs to be clarified as to which type of slicing will allow the "setting" operation to work as expected. There is a line in the docs (http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers) that says "You can use a right-hand-side of an alignable object as well." At least to me, it's not clear what objects are considered "alignable".
In any case, the expected behavior should be clear in the documentation, and, IMHO, if you fix the value of the first index or the last index, the behavior should be consistent.
Expected Output
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '0.22.0'
In [3]: s1 = pd.Series(range(8),
...: index=pd.MultiIndex.from_product([list('ab'),
...: list('xy'),
...: [1,2]],
...: names=['ab','xy','num'])
...: )
...:
In [4]: s1
Out[4]:
ab xy num
a x 1 0
2 1
y 1 2
2 3
b x 1 4
2 5
y 1 6
2 7
dtype: int64
In [5]:
In [5]: s2 = pd.Series([100*(i+1) for i in range(4)],
...: index=pd.MultiIndex.from_product([list('ab'),
...: list('xy')],
...: names=['ab','xy']))
...:
In [6]: s2
Out[6]:
ab xy
a x 100
y 200
b x 300
y 400
dtype: int64
In [7]: s1.loc[pd.IndexSlice[:,:,1]] = -1 # This works as expected
In [8]: s1
Out[8]:
ab xy num
a x 1 -1
2 1
y 1 -1
2 3
b x 1 -1
2 5
y 1 -1
2 7
dtype: int64
In [9]: s3 = s1.loc[pd.IndexSlice[:,:,1]] + s2 # This works as expected
In [10]: s3
Out[10]:
ab xy
a x 99
y 199
b x 299
y 399
dtype: int64
In [11]: s1.loc[pd.IndexSlice[:,:,1]] = s2 # This works differently in v0.22.0 and v0.23 (dev)
In [12]: s1
Out[12]:
ab xy num
a x 1 NaN
2 1.0
y 1 NaN
2 3.0
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
In [13]: s1.loc[pd.IndexSlice['a',:,:]] = -2 # This works as expected
In [14]: s1
Out[14]:
ab xy num
a x 1 -2.0
2 -2.0
y 1 -2.0
2 -2.0
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
In [15]: s4 = pd.Series([1000*i for i in range(1,5)], index=pd.MultiIndex.from_pr
...: oduct([list('xy'),[1,2]], names=['xy','num']))
...:
In [16]: s4
Out[16]:
xy num
x 1 1000
2 2000
y 1 3000
2 4000
dtype: int64
In [17]: s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This should not fail
In[18]: s5
Out[18]:
xy num
x 1 998
2 1998
y 1 2998
2 3998
In[19]: s1.loc[pd.IndexSlice['a',:,:]] = s4 # This should not set NaN
In[20]: s1
ab xy num
a x 1 1000
2 2000
y 1 3000
2 4000
b x 1 NaN
2 5.0
y 1 NaN
2 7.0
dtype: float64
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None