-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: operate on arrays instead of Series in DataFrame/DataFrame ops #33561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
1102f0d
cd0218c
44ec4f4
9ff61c4
c4b7823
b8de50e
103db6c
d16ced8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -310,10 +310,6 @@ def dispatch_to_series(left, right, func, str_rep=None, axis=None): | |
------- | ||
DataFrame | ||
""" | ||
# Note: we use iloc to access columns for compat with cases | ||
# with non-unique columns. | ||
import pandas.core.computation.expressions as expressions | ||
|
||
right = lib.item_from_zerodim(right) | ||
if lib.is_scalar(right) or np.ndim(right) == 0: | ||
|
||
|
@@ -322,11 +318,14 @@ def dispatch_to_series(left, right, func, str_rep=None, axis=None): | |
bm = left._mgr.apply(array_op, right=right) | ||
return type(left)(bm) | ||
|
||
elif isinstance(right, ABCDataFrame): | ||
array_op = get_array_op(func, str_rep=str_rep) | ||
|
||
if isinstance(right, ABCDataFrame): | ||
assert right._indexed_same(left) | ||
|
||
def column_op(a, b): | ||
return {i: func(a.iloc[:, i], b.iloc[:, i]) for i in range(len(a.columns))} | ||
arrays = [] | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for l, r in zip(left._iter_column_arrays(), right._iter_column_arrays()): | ||
arrays.append(array_op(l, r)) | ||
|
||
elif isinstance(right, ABCSeries) and axis == "columns": | ||
# We only get here if called via _combine_series_frame, | ||
|
@@ -338,27 +337,28 @@ def column_op(a, b): | |
# Note: we do not do this unconditionally as it may be lossy or | ||
# expensive for EA dtypes. | ||
right = np.asarray(right) | ||
|
||
def column_op(a, b): | ||
return {i: func(a.iloc[:, i], b[i]) for i in range(len(a.columns))} | ||
|
||
else: | ||
right = right._values | ||
|
||
def column_op(a, b): | ||
return {i: func(a.iloc[:, i], b.iloc[i]) for i in range(len(a.columns))} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. doesnt need to be in this PR, but since you're attention is already focused on this line: this can be incorrect if The main idea I've had here is to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when using something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dont think there's an issue for this. If I had found one, i would have added the Numeric label |
||
arrays = [] | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for l, r in zip(left._iter_column_arrays(), right): | ||
arrays.append(array_op(l, r)) | ||
|
||
elif isinstance(right, ABCSeries): | ||
assert right.index.equals(left.index) # Handle other cases later | ||
right = right._values | ||
|
||
def column_op(a, b): | ||
return {i: func(a.iloc[:, i], b) for i in range(len(a.columns))} | ||
arrays = [] | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for l in left._iter_column_arrays(): | ||
arrays.append(array_op(l, right)) | ||
|
||
else: | ||
# Remaining cases have less-obvious dispatch rules | ||
raise NotImplementedError(right) | ||
|
||
new_data = expressions.evaluate(column_op, str_rep, left, right) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this have a perf impact? IIRC a while back I tried removing it because I thought it shouldn't, but found that it did There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I understand, the For example, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was what i thought too, and I could be remembering incorrectly. But this is worth double-checking. |
||
return new_data | ||
return type(left)._from_arrays( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think you can just return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to explicitly use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good |
||
arrays, left.columns, left.index, verify_integrity=False | ||
) | ||
|
||
|
||
# ----------------------------------------------------------------------------- | ||
|
Uh oh!
There was an error while loading. Please reload this page.