Closed
Description
There are code paths for arithmetic methods (in particular DataFrame._combine_frame
, _combine_match_index
) that operate with self.values
, other.values
, and as a result behave differently from their Series/Index analogues. Example:
df = pd.DataFrame([pd.Timedelta(seconds=1), pd.Timedelta(seconds=2)])
df2 = pd.DataFrame([1*10**9, 2*10**9])
>>> df + df2 # <-- should raise TypeError
>>> df + df2
0
0 00:00:02
1 00:00:04
>>> df[0] + df2[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 1274, in wrapper
result = dispatch_to_index_op(op, left, right, pd.TimedeltaIndex)
File "pandas/core/ops.py", line 1331, in dispatch_to_index_op
'operation [{name}]'.format(name=op.__name__))
TypeError: incompatible type for a datetime/timedelta operation [add]
AFAICT there are two options for how to fix these:
-
Have all DataFrame arith/comparison ops operate column-wise. Downside is performance hit on currently-OK operations; AFAIK this is part of the motivation behind using Blocks in the first place.
-
Make
.values
point at EAs, implement the arith/comparison methods on EAs.
- This would require implementing a bunch of not-currently-existing EAs for pretty much all the standard np dtypes (Make EAs for all Block subclasses #22388)
- This would require supporting 2D EAs.