-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-43241][PS] MultiIndex.append
not checking names for equality
#42787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
internal = InternalFrame( | ||
spark_frame=sdf_appended, | ||
index_spark_columns=[ | ||
scol_for(sdf_appended, col) for col in self._internal.index_spark_column_names | ||
], | ||
index_names=index_names, | ||
index_names=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply set the index_names
to None
to follow the behavior of Pandas, since Pandas doesn't keep the name of MultiIndex
when computing the append
from Pandas 2.0.0. (See pandas-dev/pandas#48288 more detail)
cc @zhengruifeng @HyukjinKwon as CI passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a bug in Pandas that might be fixed in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's an intentional behavior since they mentioned this in "Bug fixes" section in their release note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I thought it was a bug in Pandas that is fixed in the Pandas 2.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also mention this in our migration doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I misunderstood the Pandas PR. LTGM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we also mention this in our migration doc?
Hmm.. I didn't mention this as a behavior change since it's a bug fix, but on second thought maybe we'd better to mention in the migration guide anyway.
Let me create a follow-up for updating the migration guide.
merged to master |
### What changes were proposed in this pull request? This follow-ups for #42787 to update the migration guide. ### Why are the changes needed? We should mention all the behavior changes in migration guide. ### Does this PR introduce _any_ user-facing change? No. it's documentation update ### How was this patch tested? The existing CI should pass ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42811 from itholic/43241-migration. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
This PR proposes to fix the behavior of
MultiIndex.append
to do not checking names.Why are the changes needed?
To match the behavior with pandas according to pandas-dev/pandas#48288
Does this PR introduce any user-facing change?
Yes, the behavior is changed to match with pandas:
Testing data
Before
After
How was this patch tested?
Fix the existing UTs.
Was this patch authored or co-authored using generative AI tooling?
No.