Closed
Description
Create MultiIndex DataFrame
import pandas as pd
df = pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'), ('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio A', 'USD', 'amount'), ('Portfolio A', 'USD', 'injection'),
('Portfolio B', 'JPY', 'amount'), ('Portfolio B', 'JPY', 'injection')])
).sortlevel(axis=1)
Add a column at Level 2
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
inject = df.loc[:, pd.IndexSlice[:, :, 'injection']]
dav = amount - amount.shift() - inject.shift().values
dav.columns.set_levels(['daily_added_value'], level=2, inplace=True)
df = pd.concat([df, dav], axis=1).sortlevel(axis=1)
print(df.T) #transposed to fit only
I can repeat the add a column at Level 2 code above multiple times, however if I try the following I get an error (possibly related to NAN's):
dav1 = df.loc[:, pd.IndexSlice[:, :, 'daily_added_value']]
amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
dr = dav1 * amount
dr.columns.set_levels(['daily_return'], level=2, inplace=True)
df = pd.concat([df, dr], axis=1).sortlevel(axis=1)
Expected Output - ( daily_returns is added to level 2 ( daily_added_value * amount )
Note I have put zeros as expected values but the desired output is daily_added_value * amount) I am also adding other measures at level 2 with division and shift() in the formulae. These fail with same error.
2010-01-01 2010-01-02 2010-01-03 \
Portfolio A GBP amount 5777.0 4849.0 4971.0
daily_added_value NaN -1028.0 122.0
injection 100.0 0.0 0.0
USD amount 5419.0 3381.0 4645.0
daily_added_value NaN -6925.0 1264.0
daily_returns 0 0 0
injection 4887.0 0.0 3424.0
Portfolio B EUR amount 5385.0 4539.0 3824.0
daily_added_value NaN -1046.0 -715.0
injection 200.0 0.0 0.0
JPY amount 100.0 0.0 0.0
daily_added_value NaN -300.0 NaN
daily_returns 0 0 0
injection 200.0 NaN 0.0
2010-01-04 2010-01-05
Portfolio A GBP amount 4827.0 5207.0
daily_added_value -144.0 180.0
injection 200.0 0.0
USD amount 4552.0 4876.0
daily_added_value -3517.0 -2829.0
daily_returns 0 0 0
injection 3153.0 3358.0
Portfolio B EUR amount 3459.0 3670.0
daily_added_value -365.0 -89.0
injection 300.0 0.0
JPY amount 100.0 0.0
daily_added_value 100.0 -300.0
daily_returns 0 0 0
injection 200.0 0.0
Error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-56-7a5afb5f45f1> in <module>()
2 amount = df.loc[:, pd.IndexSlice[:, :, 'amount']]
3 dr = dav1 * amount
----> 4 dr.columns.set_levels(['daily_return'], level=2, inplace=True)
5 df = pd.concat([df, dr], axis=1).sortlevel(axis=1)
C:\Users\xxxxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\multi.py in set_levels(self, levels, level, inplace, verify_integrity)
227 idx._reset_identity()
228 idx._set_levels(levels, level=level, validate=True,
--> 229 verify_integrity=verify_integrity)
230 if not inplace:
231 return idx
C:\Users\xxxxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\multi.py in _set_levels(self, levels, level, copy, validate, verify_integrity)
164
165 if verify_integrity:
--> 166 self._verify_integrity()
167
168 def set_levels(self, levels, level=None, inplace=False,
C:\Users\xxxxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\indexes\multi.py in _verify_integrity(self)
127 " level (%d). NOTE: this index is in an"
128 " inconsistent state" % (i, label.max(),
--> 129 len(level)))
130
131 def _get_levels(self):
ValueError: On level 2, label max (1) >= length of level (1). NOTE: this index is in an inconsistent state
How do you make the index consistent?
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: 0.2.1
This post is also raised as a question and also related to this question