-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: MultiIndex.from_frame #23141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: MultiIndex.from_frame #23141
Changes from 11 commits
79bdecb
fa82618
64b45d6
64c7bb1
3ee676c
fd266f5
4bc8f5b
9d92b70
45595ad
3530cd3
1c22791
cf78780
64c2750
ede030b
190c341
e0df632
78ff5c2
0252db9
d98c8a9
8a1906e
08c120f
8353c3f
9df3c11
6d4915e
b5df7b2
ab3259c
cf95261
63051d7
a75a4a5
8d23df9
c8d696d
7cf82d1
1a282e5
b3c6a90
c760359
bb69314
9e11180
96c6af3
a5236bf
c78f364
14bfea8
6960804
11c5947
904644a
30fe0df
ec60563
8fc6609
9b906c6
e416122
4ef9ec4
4240a1e
9159b2d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1189,11 +1189,11 @@ def to_frame(self, index=True, name=None): | |
else: | ||
idx_names = self.names | ||
|
||
result = DataFrame({(name or level): | ||
self._get_level_values(level) | ||
for name, level in | ||
zip(idx_names, range(len(self.levels)))}, | ||
copy=False) | ||
result = DataFrame( | ||
ms7463 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
list(self), | ||
columns=[n or i for i, n in enumerate(idx_names)] | ||
) | ||
|
||
if index: | ||
result.index = self | ||
return result | ||
|
@@ -1294,6 +1294,7 @@ def from_arrays(cls, arrays, sortorder=None, names=None): | |
MultiIndex.from_tuples : Convert list of tuples to MultiIndex | ||
MultiIndex.from_product : Make a MultiIndex from cartesian product | ||
of iterables | ||
MultiIndex.from_frame : Make a MultiIndex from a DataFrame. | ||
""" | ||
if not is_list_like(arrays): | ||
raise TypeError("Input must be a list / sequence of array-likes.") | ||
|
@@ -1343,6 +1344,7 @@ def from_tuples(cls, tuples, sortorder=None, names=None): | |
MultiIndex.from_arrays : Convert list of arrays to MultiIndex | ||
MultiIndex.from_product : Make a MultiIndex from cartesian product | ||
of iterables | ||
MultiIndex.from_frame : Make a MultiIndex from a DataFrame. | ||
""" | ||
if not is_list_like(tuples): | ||
raise TypeError('Input must be a list / sequence of tuple-likes.') | ||
|
@@ -1399,6 +1401,7 @@ def from_product(cls, iterables, sortorder=None, names=None): | |
-------- | ||
MultiIndex.from_arrays : Convert list of arrays to MultiIndex | ||
MultiIndex.from_tuples : Convert list of tuples to MultiIndex | ||
MultiIndex.from_frame : Make a MultiIndex from a DataFrame. | ||
""" | ||
from pandas.core.arrays.categorical import _factorize_from_iterables | ||
from pandas.core.reshape.util import cartesian_product | ||
|
@@ -1412,6 +1415,61 @@ def from_product(cls, iterables, sortorder=None, names=None): | |
labels = cartesian_product(labels) | ||
return MultiIndex(levels, labels, sortorder=sortorder, names=names) | ||
|
||
@classmethod | ||
def from_frame(cls, df, squeeze=True): | ||
""" | ||
Make a MultiIndex from a DataFrame. | ||
|
||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Parameters | ||
---------- | ||
df : pd.DataFrame | ||
DataFrame to be converted to MultiIndex. | ||
squeeze : bool, default True | ||
ms7463 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't followed the discussion closely, why are we adding a I'd rather the user does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah was thinking of that as well. Maybe we should expose There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have removed squeeze parameter (and method in accordance with @toobaz 's comment). Can't imagine this would be needed often at all. |
||
If df is a single column, squeeze multiindex to be a regular index. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. multiiindex --> MultiIndex |
||
|
||
Returns | ||
------- | ||
MultiIndex or Index | ||
The MultiIndex representation of the given DataFrame. Returns an | ||
Index if the DataFrame is single column and squeeze is True. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame([[0, 'green'], [0, 'purple'], [1, 'green'], | ||
... [1, 'purple'], [2, 'green'], [2, 'purple']], | ||
... columns=['number', 'color']) | ||
>>> df | ||
number color | ||
0 0 green | ||
1 0 purple | ||
2 1 green | ||
3 1 purple | ||
4 2 green | ||
5 2 purple | ||
>>> pd.MultiIndex.from_frame(df) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put a blank line between cases There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. adding a comment on the case if its not obvious There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed other examples as I have removed the callable names /squeeze parameter
ms7463 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
MultiIndex(levels=[[0, 1, 2], ['green', 'purple']], | ||
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]], | ||
names=['number', 'color']) | ||
|
||
See Also | ||
-------- | ||
MultiIndex.from_arrays : Convert list of arrays to MultiIndex | ||
MultiIndex.from_tuples : Convert list of tuples to MultiIndex | ||
MultiIndex.from_product : Make a MultiIndex from cartesian product | ||
of iterables | ||
""" | ||
from pandas import DataFrame | ||
ms7463 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# just let column level names be the tuple of the meta df columns | ||
# since they're not required to be strings | ||
if not isinstance(df, DataFrame): | ||
raise TypeError("Input must be a DataFrame") | ||
columns = list(df) | ||
mi = cls.from_tuples(list(df.values), names=columns) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, I would expect that the dtypes are preserved here, but it seems like they aren't. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will try out something along these lines when I’m at a computer.
|
||
if squeeze: | ||
return mi.squeeze() | ||
else: | ||
return mi | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personal nit: change the four lines of return mi.squeeze() if squeeze else mi |
||
|
||
def _sort_levels_monotonic(self): | ||
""" | ||
.. versionadded:: 0.20.0 | ||
|
@@ -1474,6 +1532,27 @@ def _sort_levels_monotonic(self): | |
names=self.names, sortorder=self.sortorder, | ||
verify_integrity=False) | ||
|
||
def squeeze(self): | ||
""" | ||
Squeeze a single level MultiIndex to be a regular Index instane. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instane --> instance |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to make this a public method. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed squeeze -> _squeeze There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am re-thinking if this should be public, see #22866 |
||
Returns | ||
------- | ||
Index or MultiIndex | ||
Returns Index equivalent of single level MultiIndex. Returns | ||
copy of MultiIndex if multilevel. | ||
|
||
Examples | ||
-------- | ||
>>> mi = pd.MultiIndex.from_tuples([('a',), ('b',), ('c',)]) | ||
>>> mi.squeeze() | ||
ms7463 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Index(['a', 'b', 'c'], dtype='object') | ||
""" | ||
if len(self.levels) == 1: | ||
return self.levels[0][self.labels[0]] | ||
else: | ||
return self.copy() | ||
|
||
def remove_unused_levels(self): | ||
""" | ||
Create a new MultiIndex from the current that removes | ||
|
Uh oh!
There was an error while loading. Please reload this page.