Description
-
I have checked that this issue has not already been reported
- The issue could potentially be similar to that reported in BUG: Fails and or weird aggregation results when using agg with custom functions #33517, but I'm not sure.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
import scipy.stats
def circ_mean(data, dummy_kwarg=0):
# print(data)
return 180/np.pi*scipy.stats.circmean(data*np.pi/180)
def numpy_mean(data, dummy_kwarg=0):
return np.mean(data)
@pd.api.extensions.register_dataframe_accessor("my")
class CircstatsAccessor(object):
def __init__(self, pandas_obj):
self._obj = pandas_obj
def circ_mean(self, axis=0, level=None, **kwargs):
df = self._obj
if axis != 0 or level is not None:
df = df.groupby(axis=axis, level=level)
return df.agg(circ_mean, **kwargs)
def numpy_mean(self, axis=0, level=None, **kwargs):
df = self._obj
if axis != 0 or level is not None:
df = df.groupby(axis=axis, level=level)
return df.agg(numpy_mean, **kwargs)
df = pd.DataFrame(
data={
"col1": [10, 11, 12, 13],
"col2": [20, 21, 22, 23],
},
index=[1, 2, 3, 4]
)
# Compute results with the standard `df.mean` call
# I'd like my custom mean function to do a similar thing
df.mean(level=0, axis=0)
# If I don't pass in any kwargs, `df.my.circ_mean` behaves as expected
# Results approximately match those from `df.mean`
df.my.circ_mean(level=0, axis=0)
# If I pass in a kwarg that is not ever used, `df.my.circ_mean`
# returns unusual results - the returned values in `col1` are
# identical to those in `col2`, whereas they were
# different before
df.my.circ_mean(level=0, axis=0, dummy_kwarg=0)
# If I call `df.my.numpy_mean`, results are identical
# without or without providing the kwarg
df.my.numpy_mean(level=0, axis=0)
df.my.numpy_mean(level=0, axis=0, dummy_kwarg=0)
Problem description
As discussed in the code comments above, I see a difference in behavior in my circ_mean
function depending on whether a dummy (un-used) keyword argument is specified. Uncommenting the print
command in the circ_mean
function indicates that df.agg
is passing in different things depending on whether or not this keyword is provided.
I would expect there to be no difference in behavior since this keyword has no effect. Interestingly, I see the expected no difference in behavior if I replace the more complicated circular mean call with a simple np.mean
call inside my custom function (compare circ_mean
and numpy_mean
functions).
Expected Output
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None
pandas : 1.2.0
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2