Open
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
pd.options.mode.dtype_backend = 'pyarrow'
df = pd.DataFrame({
'tags': pd.Series([1,1,1,2,2,2,3,3,3,4,4,4,5,5,5],dtype='int64[pyarrow]'),
'value': pd.Series(np.random.rand(15),dtype='double[pyarrow]')
})
result1 = df.groupby('tags')['value'].transform(lambda x: x.sum())
print(result1.dtype)
result2 = df.groupby('tags')['value'].transform('sum')
print(result2.dtype)
result3 = df.groupby('tags')['value'].apply(lambda x: x.sum())
print(result3.dtype)
result4 = df.groupby('tags')['value'].apply('sum')
print(result4.dtype)
Issue Description
Currently having a look at the RC for my current work project. I noticed when using the lambda function with groupby
together with apply
or transform
the dtype changes from double[pyarrow] back to float64.
Expected Behavior
I'd expect that we consistently get the same datatype
Installed Versions
2.0.0rc0