Skip to content

BUG: Regression from 1.2.5 to 1.3.x: groupby using sum on DataFrame containing lists fails #43108

Closed
@Dr-Irv

Description

@Dr-Irv
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

Version 1.3.2:

>>> df=pd.DataFrame([["M", [1]], ["M", [2]], ["W", [10]], ["W", [20]]], columns=["MW", "v"])
>>> df.groupby("MW").sum()
Empty DataFrame
Columns: []
Index: [M, W]
>>> df.groupby("MW").v.sum()
MW
M      [1, 2]
W    [10, 20]
Name: v, dtype: object

Version 1.2.5:

>>> df=pd.DataFrame([["M", [1]], ["M", [2]], ["W", [10]], ["W", [20]]], columns=["MW", "v"])
>>> df.groupby("MW").sum()
           v
MW
M     [1, 2]
W   [10, 20]
>>> df.groupby("MW").v.sum()
MW
M      [1, 2]
W    [10, 20]
Name: v, dtype: object

Problem description

With 1.2.5 and earlier, if the elements of one of the columns are objects that are lists, then the groupby operation with sum creates the aggregation of the lists. With 1.3.2, it doesn't.

Expected Output

Same as 1.2.5

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3af1a4f
python : 3.8.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.4.0.dev0+475.g3af1a4fa27
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 49.6.0.post20210108
Cython : 0.29.23
pytest : 6.2.4
hypothesis : 6.14.1
sphinx : 3.5.4
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.25.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.05.0
fastparquet : 0.6.3
gcsfs : 2021.05.0
matplotlib : 3.3.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : 0.4.2
scipy : 1.7.0
sqlalchemy : 1.4.20
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.52.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestGroupbyNested DataData where the values are collections (lists, sets, dicts, objects, etc.).RegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions