Skip to content

BUG: df.agg call passes different things to a custom function depending on whether a unused kwarg is supplied or not #39169

Open
@pjireland

Description

@pjireland

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import scipy.stats


def circ_mean(data, dummy_kwarg=0):
    # print(data)
    return 180/np.pi*scipy.stats.circmean(data*np.pi/180)


def numpy_mean(data, dummy_kwarg=0):
    return np.mean(data)


@pd.api.extensions.register_dataframe_accessor("my")
class CircstatsAccessor(object):
    def __init__(self, pandas_obj):
        self._obj = pandas_obj
        
    def circ_mean(self, axis=0, level=None, **kwargs):
        df = self._obj
        if axis != 0 or level is not None:
            df = df.groupby(axis=axis, level=level)
        return df.agg(circ_mean, **kwargs)
    
    def numpy_mean(self, axis=0, level=None, **kwargs):
        df = self._obj
        if axis != 0 or level is not None:
            df = df.groupby(axis=axis, level=level)
        return df.agg(numpy_mean, **kwargs)


df = pd.DataFrame(
    data={
        "col1": [10, 11, 12, 13],
        "col2": [20, 21, 22, 23],
    },
    index=[1, 2, 3, 4]
)

# Compute results with the standard `df.mean` call
# I'd like my custom mean function to do a similar thing
df.mean(level=0, axis=0)

# If I don't pass in any kwargs, `df.my.circ_mean` behaves as expected
# Results approximately match those from `df.mean`
df.my.circ_mean(level=0, axis=0)

# If I pass in a kwarg that is not ever used, `df.my.circ_mean`
# returns unusual results - the returned values in `col1` are 
# identical to those in `col2`, whereas they were
# different before
df.my.circ_mean(level=0, axis=0, dummy_kwarg=0)

# If I call `df.my.numpy_mean`, results are identical
# without or without providing the kwarg
df.my.numpy_mean(level=0, axis=0)
df.my.numpy_mean(level=0, axis=0, dummy_kwarg=0)

Problem description

As discussed in the code comments above, I see a difference in behavior in my circ_mean function depending on whether a dummy (un-used) keyword argument is specified. Uncommenting the print command in the circ_mean function indicates that df.agg is passing in different things depending on whether or not this keyword is provided.

I would expect there to be no difference in behavior since this keyword has no effect. Interestingly, I see the expected no difference in behavior if I replace the more complicated circular mean call with a simple np.mean call inside my custom function (compare circ_mean and numpy_mean functions).

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 1.2.0
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions