Closed
Description
See http://stackoverflow.com/questions/17236852/pandas-crosstab-double-counting-when-using-two-aggregate-functions for discussion.
To reproduce:
Create a test dataframe:
df = DataFrame({'A': ['one', 'one', 'two', 'three'] * 6, 'B': ['A', 'B', 'C'] * 8, 'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4, 'D': np.random.randn(24), 'E': np.random.randn(24)})
Crosstab gives the expected results:
crosstab(rows=[df['A'],df['B']], cols=[df['C']], margins=True)
C bar foo All
A B
one A 2 2 4
B 2 2 4
C 2 2 4
three A 2 0 2
B 0 2 2
C 2 0 2
two A 0 2 2
B 2 0 2
C 0 2 2
All 12 12 24
However, if you try to get mean and count in the same crosstab, specifying values for the mean, the crosstab will double count the elements when calculating the margin total:
crosstab(rows=[df['A'],df['B']], cols=[df['C']], margins=True, aggfunc=[np.size, np.mean], values=df['D'])
size mean
C bar foo All bar foo All
A B
one A 2 2 4 0.245998 0.076366 0.161182
B 2 2 4 -0.739757 0.137780 -0.300988
C 2 2 4 -1.555759 -1.446554 -1.501157
three A 2 NaN 2 1.216109 NaN 1.216109
B NaN 2 2 NaN 0.255482 0.255482
C 2 NaN 2 0.732448 NaN 0.732448
two A NaN 2 2 NaN -0.273747 -0.273747
B 2 NaN 2 -0.001649 NaN -0.001649
C NaN 2 2 NaN 0.685422 0.685422
All 24 24 24 -0.017102 -0.094208 -0.055655