Skip to content

BUG: Raise TypeError when joining with non-DataFrame using 'on=' (GH#61434) #61454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iabhi4
Copy link

@iabhi4 iabhi4 commented May 18, 2025

Closes GH#61434

What does this PR change?

When using DataFrame.join() with the on parameter, passing an invalid object like a dict, int, or third-party DataFrame previously resulted in unclear internal errors.

This PR adds a minimal type check that raises a clear TypeError when other is not a DataFrame, Series, or a list of such objects. Valid list-based joins without on remain unaffected.

Checklist

@iabhi4 iabhi4 force-pushed the fix-61434-nonpandas-join-typeerror branch from 78fbc2c to b604714 Compare May 19, 2025 00:26
@iabhi4
Copy link
Author

iabhi4 commented May 19, 2025

All checks passed except the Pyodide build, which failed due to a rate-limit (HTTP 429). The failure seems unrelated to this PR. A rerun should resolve it

@@ -10885,6 +10885,15 @@ def join(
raise ValueError("Other Series must have a name")
other = DataFrame({other.name: other})

if on is not None and not (
isinstance(other, (DataFrame, Series)) or isinstance(other, list)
Copy link
Member

@rhshadrach rhshadrach May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise when other is a tuple or other list-likes, whereas the current behavior can be successful. I'm -1 here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise when other is a tuple or other list-likes, whereas the current behavior can be successful. I'm -1 here.

Thanks for the feedback! I've updated the type check to avoid overly strict assumptions about other being a list, and it now handles generic iterables while preserving the original behavior. The tests have also been adjusted to reflect both invalid objects and mixed iterables.

Also, I wanted to check if this approach looks reasonable from a performance/complexity perspective.

The all() check on other is now applied only if it's an iterable (excluding strings/bytes), and I tried to keep it aligned with the original logic. Just wanted to confirm that the additional check won’t be a concern in terms of overhead for typical cases.

Open to adjustments if you think there’s a more efficient way to structure it!

@rhshadrach rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas labels May 19, 2025
@iabhi4 iabhi4 force-pushed the fix-61434-nonpandas-join-typeerror branch from b604714 to 71eb2e7 Compare May 19, 2025 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Joining Pandas with Polars dataframe produces fuzzy errormessage
2 participants