Skip to content

BUG: DataFrame+DataFrame ops #22614

Closed
Closed
@jbrockmendel

Description

@jbrockmendel

There are code paths for arithmetic methods (in particular DataFrame._combine_frame, _combine_match_index) that operate with self.values, other.values, and as a result behave differently from their Series/Index analogues. Example:

df = pd.DataFrame([pd.Timedelta(seconds=1), pd.Timedelta(seconds=2)])
df2 = pd.DataFrame([1*10**9, 2*10**9])

>>> df + df2  # <-- should raise TypeError
>>> df + df2
         0
0 00:00:02
1 00:00:04

>>> df[0] + df2[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/ops.py", line 1274, in wrapper
    result = dispatch_to_index_op(op, left, right, pd.TimedeltaIndex)
  File "pandas/core/ops.py", line 1331, in dispatch_to_index_op
    'operation [{name}]'.format(name=op.__name__))
TypeError: incompatible type for a datetime/timedelta operation [add]

AFAICT there are two options for how to fix these:

  1. Have all DataFrame arith/comparison ops operate column-wise. Downside is performance hit on currently-OK operations; AFAIK this is part of the motivation behind using Blocks in the first place.

  2. Make .values point at EAs, implement the arith/comparison methods on EAs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugDataFrameDataFrame data structure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions