-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Set dtypes of new columns when stacking (#36991) #40127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a whatsnew note. 1.3 bug fixes under reshaping is good.
pandas/core/reshape/reshape.py
Outdated
new_columns = MultiIndex.from_arrays( | ||
[ | ||
Index(new_level, dtype=level.dtype) | ||
if None not in new_level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any tests that hit the None case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, specifically test_stack_nan_in_multiindex_columns
.
None
is a tricky case---it is allowed for some Index types (like CategoricalIndex
) but not others (like Int64Index
). I am just avoiding levels with None
entirely.
I've also looked at the way None
is handled elsewhere, and it's not totally consistent:
>>> MultiIndex.from_arrays([[1, 2, None]]).levels[0]
Int64Index([1, 2], dtype='int64')
But:
>>> Index([1, 2, None])
Index([1, 2, None], dtype='object')
Perhaps in the future it would be better if Index
accepted None
values, even when a dtype is specified. Then Index([1, 2, None], dtype='int64')
would return Int64Index([1, 2], dtype='int64')
. Then we wouldn't need such a condition here. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm interesting, ok can you open an issue specifically showing the MI vs Index cases. I agree we should do something about this. ok for here on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel if you can have a look here
pandas/core/reshape/reshape.py
Outdated
new_columns = MultiIndex.from_arrays( | ||
[ | ||
Index(new_level, dtype=level.dtype) | ||
if None not in new_level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm interesting, ok can you open an issue specifically showing the MI vs Index cases. I agree we should do something about this. ok for here on this PR.
I extracted a function for stacking the column index. Let me know if you prefer it inlined. Otherwise, the same is possible for the index (lines 714--732). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good some comments
thanks @maroth96 very nice! keep em coming! |
Uh oh!
There was an error while loading. Please reload this page.