Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
# define Series and DataFrame subclasses which override mean()
class UnitSeries(pd.Series):
@property
def _constructor(self):
return UnitSeries
@property
def _constructor_expanddim(self):
return UnitDataFrame
def mean(self, *args, **kwargs):
return 1
class UnitDataFrame(pd.DataFrame):
@property
def _constructor(self):
return UnitDataFrame
@property
def _constructor_expanddim(self):
return UnitSeries
def mean(self, *args, **kwargs):
return 1
# create example data
params = ['a', 'b']
data = np.random.rand(4, 2)
udf = UnitDataFrame(data, columns=params)
udf['group'] = np.ones(4, dtype=int)
udf.loc[2:, 'group'] = 2
# calculate mean with and without groupby
print(udf.mean()) # prints 1 :)
print(udf.groupby('group').mean()) # not a 1 to be seen :(
print(udf.groupby('group').get_group().mean()) # prints 1 :)
Issue Description
Suppose I create subclasses of Series
and DataFrame
, which override a method such as mean()
. groupby().mean()
doesn't use this method, returning to the original behaviour. However, .groupby().get_group().mean()
works correctly.
Expected Behavior
SubClassedDataFrame.groupby().mean()
should use the mean method of the subclass, so in my example
print(udf.groupby('group').mean())
should print
a b
group
1 1 1
2 1 1
instead of using the usual mean:
a b
group
1 0.688265 0.324780
2 0.178812 0.663476
Installed Versions
/Users/adam/programming/env_pandas2dev/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : 7f53afc
python : 3.11.2.final.0
python-bits : 64
OS : Darwin
OS-release : 22.3.0
Version : Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_GB.UTF-8
LANG : None
LOCALE : en_GB.UTF-8
pandas : 0+untagged.31727.g7f53afc
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.0.1
Cython : None
pytest : 7.2.1
hypothesis : 6.68.2
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None