-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Raise TypeError when joining with non-DataFrame using 'on=' (GH#61434) #61454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
78fbc2c
to
b604714
Compare
All checks passed except the Pyodide build, which failed due to a rate-limit (HTTP 429). The failure seems unrelated to this PR. A rerun should resolve it |
pandas/core/frame.py
Outdated
@@ -10885,6 +10885,15 @@ def join( | |||
raise ValueError("Other Series must have a name") | |||
other = DataFrame({other.name: other}) | |||
|
|||
if on is not None and not ( | |||
isinstance(other, (DataFrame, Series)) or isinstance(other, list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will raise when other is a tuple
or other list-likes, whereas the current behavior can be successful. I'm -1 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will raise when other is a
tuple
or other list-likes, whereas the current behavior can be successful. I'm -1 here.
Thanks for the feedback! I've updated the type check to avoid overly strict assumptions about other
being a list, and it now handles generic iterables while preserving the original behavior. The tests have also been adjusted to reflect both invalid objects and mixed iterables.
Also, I wanted to check if this approach looks reasonable from a performance/complexity perspective.
The all()
check on other
is now applied only if it's an iterable (excluding strings/bytes), and I tried to keep it aligned with the original logic. Just wanted to confirm that the additional check won’t be a concern in terms of overhead for typical cases.
Open to adjustments if you think there’s a more efficient way to structure it!
b604714
to
71eb2e7
Compare
Closes GH#61434
What does this PR change?
When using
DataFrame.join()
with theon
parameter, passing an invalid object like adict
,int
, or third-party DataFrame previously resulted in unclear internal errors.This PR adds a minimal type check that raises a clear
TypeError
whenother
is not aDataFrame
,Series
, or a list of such objects. Valid list-based joins withouton
remain unaffected.Checklist
pre-commit run --all-files
doc/source/whatsnew/v3.0.0.rst