-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: update the DataFrame.stack docstring #20430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
a73bd9a
1771437
f17b52b
c756141
4d09b85
d5a262a
d3ef094
4d60246
16301d6
310511d
7e10273
77c9fac
98a4a93
41ad4cf
99734ac
652f7b2
7f422d6
15902ed
2379886
2e0873b
718f212
747d245
a2c9b1a
d34732d
5bc794c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5145,36 +5145,151 @@ def pivot_table(self, values=None, index=None, columns=None, | |
|
||
def stack(self, level=-1, dropna=True): | ||
""" | ||
Pivot a level of the (possibly hierarchical) column labels, returning a | ||
DataFrame (or Series in the case of an object with a single level of | ||
column labels) having a hierarchical index with a new inner-most level | ||
of row labels. | ||
The level involved will automatically get sorted. | ||
Stack the prescribed level(s) from the column axis onto the index | ||
axis. | ||
|
||
Return a reshaped DataFrame or Series having a multi-level | ||
index with one or more new inner-most levels compared to the current | ||
dataframe. The new inner-most levels are created by pivoting the | ||
columns of the current dataframe: | ||
|
||
- if the columns have a single level, the output is a Series; | ||
- if the columns have multiple levels, the new index | ||
level(s) is (are) taken from the prescribed level(s) and | ||
the output is a DataFrame. | ||
|
||
The new index levels are sorted. | ||
|
||
Parameters | ||
---------- | ||
level : int, string, or list of these, default last level | ||
Level(s) to stack, can pass level name | ||
dropna : boolean, default True | ||
Whether to drop rows in the resulting Frame/Series with no valid | ||
values | ||
level : int, str, list, default -1 | ||
Level(s) to stack from the column axis onto the index | ||
axis, defined as one index or label, or a list of indices | ||
or labels. | ||
dropna : bool, default True | ||
Whether to drop rows in the resulting Frame/Series with | ||
missing values. Stacking a column level onto the index | ||
axis can create combinations of index and column values | ||
that are missing from the original dataframe. See Examples | ||
section. | ||
|
||
Notes | ||
----- | ||
The function is named by analogy with a stack of books | ||
(levels) being re-organised from a horizontal position (column | ||
levels) to a vertical position (index levels). | ||
|
||
Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Examples section should go at the end, after Returns and See Also. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in 99734ac |
||
---------- | ||
>>> s | ||
-------- | ||
>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]], | ||
... index=['one', 'two'], | ||
... columns=['a', 'b']) | ||
>>> multicol1 = pd.MultiIndex.from_tuples([('X', 'a'), ('X', 'b')]) | ||
>>> df_multi_level_cols1 = pd.DataFrame([[0, 1], [2, 3]], | ||
... index=['one', 'two'], | ||
... columns=multicol1) | ||
>>> multicol2 = pd.MultiIndex.from_tuples([('X', 'a'), ('Y', 'b')]) | ||
>>> df_multi_level_cols2 = pd.DataFrame([[0.0, 1.0], [2.0, 3.0]], | ||
... index=['one', 'two'], | ||
... columns=multicol2) | ||
>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], | ||
... index=['one', 'two'], | ||
... columns=multicol2) | ||
|
||
Stacking a dataframe with a single level column axis returns a Series: | ||
|
||
>>> df_single_level_cols | ||
a b | ||
one 1. 2. | ||
two 3. 4. | ||
one 0 1 | ||
two 2 3 | ||
>>> df_single_level_cols.stack() | ||
one a 0 | ||
b 1 | ||
two a 2 | ||
b 3 | ||
dtype: int64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's just a personal opinion, but I think defining all the data first with descriptive names make it a bit more complex to understand. We could have separate sections for each case, with a title in bold (surrounding the text with double stars, followed by the data creation, using simply Also, in this case I think it would make the example easier to understand using more real-world examples. As A minor thing, when creating the data, I think it makes more sense that each row is defined as a tuple, than as a list. For example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 split the examples in several sections in 15902ed |
||
|
||
Stacking a dataframe with a multi-level column axis: | ||
|
||
>>> df_multi_level_cols1 | ||
X | ||
a b | ||
one 0 1 | ||
two 2 3 | ||
>>> df_multi_level_cols1.stack() | ||
X | ||
one a 0 | ||
b 1 | ||
two a 2 | ||
b 3 | ||
|
||
It is common to have missing values when stacking a dataframe | ||
with multi-level columns, as the stacked dataframe typically | ||
has more values than the original dataframe. Missing values | ||
are filled with NaNs: | ||
|
||
>>> df_multi_level_cols2 | ||
X Y | ||
a b | ||
one 0.0 1.0 | ||
two 2.0 3.0 | ||
>>> df_multi_level_cols2.stack() | ||
X Y | ||
one a 0.0 NaN | ||
b NaN 1.0 | ||
two a 2.0 NaN | ||
b NaN 3.0 | ||
|
||
The first parameter controls which level or levels are stacked: | ||
|
||
>>> df_multi_level_cols2.stack(0) | ||
a b | ||
one X 0.0 NaN | ||
Y NaN 1.0 | ||
two X 2.0 NaN | ||
Y NaN 3.0 | ||
>>> df_multi_level_cols2.stack([0, 1]) | ||
one X a 0.0 | ||
Y b 1.0 | ||
two X a 2.0 | ||
Y b 3.0 | ||
dtype: float64 | ||
|
||
>>> s.stack() | ||
one a 1 | ||
b 2 | ||
two a 3 | ||
b 4 | ||
Note that rows where all values are missing are dropped by | ||
default but this behaviour can be controlled via the dropna | ||
keyword parameter: | ||
|
||
>>> df_multi_level_cols3 | ||
X Y | ||
a b | ||
one NaN 1.0 | ||
two 2.0 3.0 | ||
>>> df_multi_level_cols3.stack(dropna=False) | ||
X Y | ||
one a NaN NaN | ||
b NaN 1.0 | ||
two a 2.0 NaN | ||
b NaN 3.0 | ||
|
||
>>> df_multi_level_cols3.stack(dropna=True) | ||
X Y | ||
one b NaN 1.0 | ||
two a 2.0 NaN | ||
b NaN 3.0 | ||
|
||
Returns | ||
------- | ||
stacked : DataFrame or Series | ||
DataFrame or Series | ||
Stacked dataframe or series. | ||
|
||
See Also | ||
-------- | ||
DataFrame.unstack: unstack prescribed level(s) from index axis | ||
onto column axis. | ||
DataFrame.pivot: reshape dataframe from long format to wide | ||
format. | ||
DataFrame.pivot_table: create a spreadsheet-style pivot table | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be a space between the colon, and the description should start with a capital letter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in 652f7b2 |
||
as a DataFrame. | ||
""" | ||
from pandas.core.reshape.reshape import stack, stack_multiple | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a single line. Can you shorten by "column axis" -> "columns" and "index axis" -> "index"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Fixed in a2c9b1a.
I've also modified the description in the Notes section. It was never completely clear to me why method was called stack (I think I was imagining the column as a board being moved from an horizontal position to a vertical position, whereas I think the name comes from a collection of items being moved from a side by side position to a stack), so I've tried to explain that in the notes section. Hope it makes sense!