Closed
Description
Code Sample, a copy-pastable example
The problem is best shown in the following example:
import pandas as pd
s = pd.Series(['1 day', '3 days', pd.NaT], dtype='timedelta64[ns]')
print(f'{s.mean()=}\n')
# as expected: result is '2 days'
print(f"{s.apply('mean')=}\n")
# as expected: result is '2 days'
print("transform('mean')")
print(s.groupby([1,1,1]).transform('mean'))
# whoops: (a series of 3x) -35583 days +08:04:14.381741568
# expectation was: (a series of 3x) '2 days'
# Fixing the problem
print("\ntransform(pd.Series.mean)")
print(s.groupby([1,1,1]).transform(pd.Series.mean))
# (a series of 3x) '2 days', sanity restored :-)
Problem description
When passing a function name by string to transform
, pd.NaT is not handled correctly. This not only happens for 'mean' but also for other functions like 'sum'.
I don't know how transform is looking up the function if a string is passed (and it's not really documented), but it certainly doesn't select pd.Series.mean but some other non-NaT-aware mean function.
This is surprising since NaT is handled correctly when calling apply('mean')
.
Tested with pd 1.3.2.