Closed
Description
import numpy as np
import pandas as pd
df = pd.DataFrame({i:pd.Series(np.random.normal(size=10),
index=range(10)) for i in range(11)})
df_g = df.groupby(['a']*6+['b']*5, axis=1)
This, if I well understood, should build a groupby object grouping columns, and so give the possibility to later aggregate them. And indeed :
df_g.sum()
works well. But
df_g.head()
Throws an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jonas/Code/pandas/pandas/core/groupby.py", line 986, in head
in_head = self._cumcount_array() < n
File "/home/jonas/Code/pandas/pandas/core/groupby.py", line 1044, in _cumcount_array
cumcounts[indices] = values
IndexError: index 10 is out of bounds for axis 1 with size 10
and
df_g.apply(lambda x : x.sum())
from which I expected the same result as the first example, gives this table :
a b
0 -0.381070 NaN
1 -1.214075 NaN
2 -1.496252 NaN
3 3.392565 NaN
4 -0.782376 NaN
5 1.306043 NaN
6 NaN -1.772334
7 NaN 4.125280
8 NaN 1.992329
9 NaN 4.283854
10 NaN -4.791092
I didn't really get what's happening, I don't exclude a misunderstanding or an error from myself.
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
pandas: 0.16.0-28-gcb8c130
nose: 1.3.4
Cython: 0.20.2
numpy: 1.9.2
scipy: 0.14.0
statsmodels: None
IPython: 3.0.0-dev
sphinx: 1.2.2
patsy: None
dateutil: 2.4.1
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.5.3 (dt dec mx pq3 ext)