Skip to content

BUG: DataFrame.agg with multiple cum functions creates wrong result #35490

Closed
@qinxuye

Description

@qinxuye
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
In [1]: import pandas as pd                                                     

In [2]: df1 = pd.DataFrame({ 
   ...:     'a': [3, 4, 5, 3, 5, 4, 1, 2, 3], 
   ...:     'b': [1, 3, 4, 5, 6, 5, 4, 4, 4], 
   ...:     'c': list('aabaaddce'), 
   ...:     'd': [3, 4, 5, 3, 5, 4, 1, 2, 3], 
   ...:     'e': [1, 3, 4, 5, 6, 5, 4, 4, 4], 
   ...:     'f': list('aabaaddce'), 
   ...: })                                                                      

In [3]: df1.groupby('b').agg(['cummax', 'cumsum'])                              
Out[3]: 
       a             d             e       
  cummax cumsum cummax cumsum cummax cumsum
b                                          
1      4      4      4      4      3      3
3      3      3      3      3      5      5
4      5      5      5      5      6      6
5      4      7      4      7      5     10
6      5      6      5      6      4      8

Cumulative functions should generate the DataFrame with the same length.

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

In pandas 1.0.5, the result is

In [1]: import pandas as pd                                                     

In [2]: df1 = pd.DataFrame({ 
   ...:     'a': [3, 4, 5, 3, 5, 4, 1, 2, 3], 
   ...:     'b': [1, 3, 4, 5, 6, 5, 4, 4, 4], 
   ...:     'c': list('aabaaddce'), 
   ...:     'd': [3, 4, 5, 3, 5, 4, 1, 2, 3], 
   ...:     'e': [1, 3, 4, 5, 6, 5, 4, 4, 4], 
   ...:     'f': list('aabaaddce'), 
   ...: })                                                                      

In [3]: df1.groupby('b').agg(['cummax', 'cumsum'])                              
Out[3]: 
       a             d             e       
  cummax cumsum cummax cumsum cummax cumsum
0      3      3      3      3      1      1
1      4      4      4      4      3      3
2      5      5      5      5      4      4
3      3      3      3      3      5      5
4      5      5      5      5      6      6
5      4      7      4      7      5     10
6      5      6      5      6      4      8
7      5      8      5      8      4     12
8      5     11      5     11      4     16

Expected Output

Output of pd.show_versions()

In [5]: pd.show_versions()
/Users/qinxuye/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
"Distutils was imported before Setuptools. This usage is discouraged "

ImportError Traceback (most recent call last)
in
----> 1 pd.show_versions()

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in show_versions(as_json)
104 """
105 sys_info = _get_sys_info()
--> 106 deps = _get_dependency_info()
107
108 if as_json:

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in _get_dependency_info()
82 for modname in deps:
83 mod = import_optional_dependency(
---> 84 modname, raise_on_missing=False, on_version="ignore"
85 )
86 result[modname] = _get_version(mod) if mod else None

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
97 minimum_version = VERSIONS.get(name)
98 if minimum_version:
---> 99 version = _get_version(module)
100 if distutils.version.LooseVersion(version) < minimum_version:
101 assert on_version in {"warn", "raise", "ignore"}

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in _get_version(module)
42
43 if version is None:
---> 44 raise ImportError(f"Can't determine version for {module.name}")
45 return version
46

ImportError: Can't determine version for numba

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBlockerBlocking issue or pull request for an upcoming releaseBugGroupbyRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions