[SPARK-43241][PS] `MultiIndex.append` not checking names for equality #42787

itholic · 2023-09-04T02:25:58Z

What changes were proposed in this pull request?

This PR proposes to fix the behavior of MultiIndex.append to do not checking names.

Why are the changes needed?

To match the behavior with pandas according to pandas-dev/pandas#48288

Does this PR introduce any user-facing change?

Yes, the behavior is changed to match with pandas:

Testing data

>>> psmidx1
MultiIndex([('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3)],
           names=['x', 'y', 'z'])
>>> psmidx2
MultiIndex([('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3)],
           names=['p', 'q', 'r'])

Before

>>> psmidx1.append(psmidx2)
MultiIndex([('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3),
            ('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3)],
           names=['x', 'y', 'z'])

After

>>> psmidx1.append(psmidx2)
MultiIndex([('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3),
            ('a', 'x', 1),
            ('b', 'y', 2),
            ('c', 'z', 3)],
           )

How was this patch tested?

Fix the existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

itholic · 2023-09-05T01:00:44Z

python/pyspark/pandas/indexes/base.py

        internal = InternalFrame(
            spark_frame=sdf_appended,
            index_spark_columns=[
                scol_for(sdf_appended, col) for col in self._internal.index_spark_column_names
            ],
-            index_names=index_names,
+            index_names=None,


We can simply set the index_names to None to follow the behavior of Pandas, since Pandas doesn't keep the name of MultiIndex when computing the append from Pandas 2.0.0. (See pandas-dev/pandas#48288 more detail)

cc @zhengruifeng @HyukjinKwon as CI passed.

is it a bug in Pandas that might be fixed in the future?

I believe it's an intentional behavior since they mentioned this in "Bug fixes" section in their release note?

so I thought it was a bug in Pandas that is fixed in the Pandas 2.0.0.

shouldn't we also mention this in our migration doc?

ok, I misunderstood the Pandas PR. LTGM

shouldn't we also mention this in our migration doc?

Hmm.. I didn't mention this as a behavior change since it's a bug fix, but on second thought maybe we'd better to mention in the migration guide anyway.

Let me create a follow-up for updating the migration guide.

zhengruifeng · 2023-09-05T03:46:00Z

merged to master

### What changes were proposed in this pull request? This follow-ups for #42787 to update the migration guide. ### Why are the changes needed? We should mention all the behavior changes in migration guide. ### Does this PR introduce _any_ user-facing change? No. it's documentation update ### How was this patch tested? The existing CI should pass ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42811 from itholic/43241-migration. Authored-by: Haejoon Lee <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

[SPARK-43241][PS] MultiIndex.append not checking names for equality

6308322

github-actions bot added PYTHON PANDAS API ON SPARK labels Sep 4, 2023

itholic commented Sep 5, 2023

View reviewed changes

HyukjinKwon approved these changes Sep 5, 2023

View reviewed changes

zhengruifeng closed this in d7e827e Sep 5, 2023

itholic mentioned this pull request Sep 5, 2023

[SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change #42811

Closed

itholic deleted the SPARK-43241 branch November 20, 2023 01:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-43241][PS] `MultiIndex.append` not checking names for equality #42787

[SPARK-43241][PS] `MultiIndex.append` not checking names for equality #42787

Uh oh!

itholic commented Sep 4, 2023

Uh oh!

itholic Sep 5, 2023

Uh oh!

zhengruifeng Sep 5, 2023

Uh oh!

itholic Sep 5, 2023

Uh oh!

itholic Sep 5, 2023

Uh oh!

HyukjinKwon Sep 5, 2023

Uh oh!

zhengruifeng Sep 5, 2023

Uh oh!

itholic Sep 5, 2023

Uh oh!

zhengruifeng commented Sep 5, 2023

Uh oh!

Uh oh!

[SPARK-43241][PS] MultiIndex.append not checking names for equality #42787

[SPARK-43241][PS] MultiIndex.append not checking names for equality #42787

Uh oh!

Conversation

itholic commented Sep 4, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

itholic Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

itholic Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

itholic Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

itholic Sep 5, 2023

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Sep 5, 2023

Uh oh!

Uh oh!

[SPARK-43241][PS] `MultiIndex.append` not checking names for equality #42787

[SPARK-43241][PS] `MultiIndex.append` not checking names for equality #42787