Skip to content

BUG: unstack with missing levels results in incorrect index names #38029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Dec 31, 2020
Merged

BUG: unstack with missing levels results in incorrect index names #38029

merged 19 commits into from
Dec 31, 2020

Conversation

GYHHAHA
Copy link
Contributor

@GYHHAHA GYHHAHA commented Nov 24, 2020

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GYHHAHA for the PR!

Some comments

@@ -619,6 +619,7 @@ Indexing
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using listlike indexer that contains elements that are in the index's ``categories`` but not in the index itself failing to raise ``KeyError`` (:issue:`37901`)
- Bug in :meth:`DataFrame.iloc` and :meth:`Series.iloc` aligning objects in ``__setitem__`` (:issue:`22046`)
- Bug in :meth:`DataFrame.loc` did not raise ``KeyError`` when missing combination was given with ``slice(None)`` for remaining levels (:issue:`19556`)
- Bug in :meth:`MultiIndex.remove_unused_levels` drops NaN when level contains NaN (:issue:`37510`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... was dropping missing values when levels contain ``NaN``

but also do we want to mention something about set_levels since that was what the OP was about

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.3 (both)

df1.index = df1.index.set_levels(levels=new_levels, level="id1")
df1.index = df1.index.set_levels(levels=new_levels, level="id2")

result = df1.unstack("id3")[("x", 1)].sort_index().index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say construct the expected frame and compare using assert_frame_equal

alternatively use tm.assert_index_equal

# GH 37510
df1 = DataFrame(
{
"id1": [1, 2, 3, 4],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit but I'd call these L1', L2, L3`

@@ -271,3 +271,24 @@ def test_argsort(idx):
result = idx.argsort()
expected = idx.values.argsort()
tm.assert_numpy_array_equal(result, expected)


def test_not_remove_nan():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do test_remove_unused_levels_with_missing

@GYHHAHA GYHHAHA requested a review from arw2019 November 24, 2020 08:17
@jreback jreback added MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 26, 2020
@pep8speaks
Copy link

pep8speaks commented Nov 27, 2020

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-31 07:14:20 UTC

@GYHHAHA GYHHAHA requested a review from jreback November 27, 2020 05:22
@GYHHAHA GYHHAHA changed the title BUG: MultiIndex.remove_unused_levels drops NaN when level contains NaN BUG: unstack with missing levels results in incorrect index names Nov 30, 2020
@GYHHAHA
Copy link
Contributor Author

GYHHAHA commented Dec 2, 2020

frame hardcoded, cc @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge master

@@ -755,6 +755,7 @@ Reshaping
- Bug in :meth:`DataFrame.apply` not setting index of return value when ``func`` return type is ``dict`` (:issue:`37544`)
- Bug in :func:`concat` resulting in a ``ValueError`` when at least one of both inputs had a non-unique index (:issue:`36263`)
- Bug in :meth:`DataFrame.merge` and :meth:`pandas.merge` returning inconsistent ordering in result for ``how=right`` and ``how=left`` (:issue:`35382`)
- Bug in :meth:`DataFrame.unstack` with missing levels led to incorrect index names (:issue:`37510`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.3

@@ -619,6 +619,7 @@ Indexing
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using listlike indexer that contains elements that are in the index's ``categories`` but not in the index itself failing to raise ``KeyError`` (:issue:`37901`)
- Bug in :meth:`DataFrame.iloc` and :meth:`Series.iloc` aligning objects in ``__setitem__`` (:issue:`22046`)
- Bug in :meth:`DataFrame.loc` did not raise ``KeyError`` when missing combination was given with ``slice(None)`` for remaining levels (:issue:`19556`)
- Bug in :meth:`MultiIndex.remove_unused_levels` drops NaN when level contains NaN (:issue:`37510`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.3 (both)

@GYHHAHA GYHHAHA requested a review from jreback December 31, 2020 10:11
@jreback jreback added this to the 1.3 milestone Dec 31, 2020
@jreback jreback merged commit bdc5a67 into pandas-dev:master Dec 31, 2020
@jreback
Copy link
Contributor

jreback commented Dec 31, 2020

thanks @GYHHAHA

@GYHHAHA GYHHAHA deleted the fix-multi branch January 1, 2021 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: index label messed up when reset_level then unstack then sort_index
4 participants