Open
Description
Best shown with an example.
import numpy as np, pandas as pd
timestamps = map(pd.Timestamp, ['2014-01-01', '2014-02-01'])
categories = ['A', 'B', 'C', 'D']
df = pd.DataFrame(index=pd.MultiIndex.from_product([timestamps, categories], names=['ts', 'cat']),
columns=['Col1', 'Col2'])
>>> df
Col1 Col2
ts cat
2014-01-01 A NaN NaN
B NaN NaN
C NaN NaN
D NaN NaN
2014-02-01 A NaN NaN
B NaN NaN
C NaN NaN
D NaN NaN
I want to set the values for all categories in a single month. These examples work just fine.
df.loc['2014-01-01', 'Col1'] = 5
df.loc['2014-01-01', 'Col2'] = [1,2,3,4]
>>> df
Col1 Col2
ts cat
2014-01-01 A 5 1
B 5 2
C 5 3
D 5 4
2014-02-01 A NaN NaN
B NaN NaN
C NaN NaN
D NaN NaN
These examples don't work.
df.loc['2014-01-01', 'Col1'] += 1
df.loc['2014-02-01', 'Col2'] = df.loc['2014-01-01', 'Col2']
>>> df
Col1 Col2
ts cat
2014-01-01 A NaN 1
B NaN 2
C NaN 3
D NaN 4
2014-02-01 A NaN NaN
B NaN NaN
C NaN NaN
D NaN NaN
It doesn't seem to be a "setting a value on a copy" issue. Instead, Pandas is writing the NaNs.
My current workaround is to unstack each column into a DataFrame with simple indexes. This works, but I have lots of columns to work with. One dataframe is much easier to work with than a pile of dataframes.
The computations for each month depend on the values computed in the previous month, hence why it can't be done fully vectorized on an entire column.