Skip to content

DOC: update the DataFrame.stack docstring #20430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 14 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a73bd9a
Fix docstring or pandas.DataFrame.stack.
samuelsinayoko Mar 20, 2018
1771437
Polish the docstring (plural issues and the like).
samuelsinayoko Mar 20, 2018
f17b52b
Add description to example.
samuelsinayoko Mar 20, 2018
c756141
Add an example with multi-level column.
samuelsinayoko Mar 20, 2018
4d09b85
Add more examples.
samuelsinayoko Mar 20, 2018
d5a262a
Fix sphinx docs
samuelsinayoko Mar 20, 2018
d3ef094
Fix parameter types
samuelsinayoko Mar 20, 2018
4d60246
Post review improvements.
samuelsinayoko Mar 20, 2018
16301d6
Start refactoring the examples.
samuelsinayoko Mar 20, 2018
310511d
Refactor examples
samuelsinayoko Mar 20, 2018
7e10273
Polish examples.
samuelsinayoko Mar 20, 2018
77c9fac
Add an example where multiple levels are stacked at once.
samuelsinayoko Mar 20, 2018
98a4a93
Clarify filling behaviour with missing values
samuelsinayoko Mar 20, 2018
41ad4cf
flake8
samuelsinayoko Mar 20, 2018
99734ac
Put Examples section at the end.
samuelsinayoko Mar 21, 2018
652f7b2
Fix 'See Also' section.
samuelsinayoko Mar 21, 2018
7f422d6
Create separate section for single level columns.
samuelsinayoko Mar 21, 2018
15902ed
Split the examples into several sections.
samuelsinayoko Mar 22, 2018
2379886
remove unwanted blank lines
samuelsinayoko Mar 22, 2018
2e0873b
Start using more meaningful index & column names
samuelsinayoko Mar 22, 2018
718f212
Use more meaningful column and index names.
samuelsinayoko Mar 22, 2018
747d245
Shorten overly long lines in examples.
samuelsinayoko Mar 25, 2018
a2c9b1a
Shorter one line description.
samuelsinayoko Mar 25, 2018
d34732d
Better description in the notes section.
samuelsinayoko Mar 25, 2018
5bc794c
Formatting [ci skip]
TomAugspurger Mar 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 135 additions & 20 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5145,36 +5145,151 @@ def pivot_table(self, values=None, index=None, columns=None,

def stack(self, level=-1, dropna=True):
"""
Pivot a level of the (possibly hierarchical) column labels, returning a
DataFrame (or Series in the case of an object with a single level of
column labels) having a hierarchical index with a new inner-most level
of row labels.
The level involved will automatically get sorted.
Stack the prescribed level(s) from the column axis onto the index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single line. Can you shorten by "column axis" -> "columns" and "index axis" -> "index"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Fixed in a2c9b1a.

I've also modified the description in the Notes section. It was never completely clear to me why method was called stack (I think I was imagining the column as a board being moved from an horizontal position to a vertical position, whereas I think the name comes from a collection of items being moved from a side by side position to a stack), so I've tried to explain that in the notes section. Hope it makes sense!

axis.

Return a reshaped DataFrame or Series having a multi-level
index with one or more new inner-most levels compared to the current
dataframe. The new inner-most levels are created by pivoting the
columns of the current dataframe:

- if the columns have a single level, the output is a Series;
- if the columns have multiple levels, the new index
level(s) is (are) taken from the prescribed level(s) and
the output is a DataFrame.

The new index levels are sorted.

Parameters
----------
level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid
values
level : int, str, list, default -1
Level(s) to stack from the column axis onto the index
axis, defined as one index or label, or a list of indices
or labels.
dropna : bool, default True
Whether to drop rows in the resulting Frame/Series with
missing values. Stacking a column level onto the index
axis can create combinations of index and column values
that are missing from the original dataframe. See Examples
section.

Notes
-----
The function is named by analogy with a stack of books
(levels) being re-organised from a horizontal position (column
levels) to a vertical position (index levels).

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples section should go at the end, after Returns and See Also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 99734ac

----------
>>> s
--------
>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
... index=['one', 'two'],
... columns=['a', 'b'])
>>> multicol1 = pd.MultiIndex.from_tuples([('X', 'a'), ('X', 'b')])
>>> df_multi_level_cols1 = pd.DataFrame([[0, 1], [2, 3]],
... index=['one', 'two'],
... columns=multicol1)
>>> multicol2 = pd.MultiIndex.from_tuples([('X', 'a'), ('Y', 'b')])
>>> df_multi_level_cols2 = pd.DataFrame([[0.0, 1.0], [2.0, 3.0]],
... index=['one', 'two'],
... columns=multicol2)
>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
... index=['one', 'two'],
... columns=multicol2)

Stacking a dataframe with a single level column axis returns a Series:

>>> df_single_level_cols
a b
one 1. 2.
two 3. 4.
one 0 1
two 2 3
>>> df_single_level_cols.stack()
one a 0
b 1
two a 2
b 3
dtype: int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a personal opinion, but I think defining all the data first with descriptive names make it a bit more complex to understand.

We could have separate sections for each case, with a title in bold (surrounding the text with double stars, followed by the data creation, using simply df in all the cases.

Also, in this case I think it would make the example easier to understand using more real-world examples. As a, b... don't have a meaning, it'd a bit harder to understand what's going on.

A minor thing, when creating the data, I think it makes more sense that each row is defined as a tuple, than as a list.

For example:

**Single level**

>>> df = pd.DataFrame([(8, 12), (22, 35)],
...                   index=['cat', 'dog'],
...                   columns=['weight', 'max_speed'])
>>> df

>>> df.stack()

Copy link
Contributor Author

@samuelsinayoko samuelsinayoko Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 split the examples in several sections in 15902ed


Stacking a dataframe with a multi-level column axis:

>>> df_multi_level_cols1
X
a b
one 0 1
two 2 3
>>> df_multi_level_cols1.stack()
X
one a 0
b 1
two a 2
b 3

It is common to have missing values when stacking a dataframe
with multi-level columns, as the stacked dataframe typically
has more values than the original dataframe. Missing values
are filled with NaNs:

>>> df_multi_level_cols2
X Y
a b
one 0.0 1.0
two 2.0 3.0
>>> df_multi_level_cols2.stack()
X Y
one a 0.0 NaN
b NaN 1.0
two a 2.0 NaN
b NaN 3.0

The first parameter controls which level or levels are stacked:

>>> df_multi_level_cols2.stack(0)
a b
one X 0.0 NaN
Y NaN 1.0
two X 2.0 NaN
Y NaN 3.0
>>> df_multi_level_cols2.stack([0, 1])
one X a 0.0
Y b 1.0
two X a 2.0
Y b 3.0
dtype: float64

>>> s.stack()
one a 1
b 2
two a 3
b 4
Note that rows where all values are missing are dropped by
default but this behaviour can be controlled via the dropna
keyword parameter:

>>> df_multi_level_cols3
X Y
a b
one NaN 1.0
two 2.0 3.0
>>> df_multi_level_cols3.stack(dropna=False)
X Y
one a NaN NaN
b NaN 1.0
two a 2.0 NaN
b NaN 3.0

>>> df_multi_level_cols3.stack(dropna=True)
X Y
one b NaN 1.0
two a 2.0 NaN
b NaN 3.0

Returns
-------
stacked : DataFrame or Series
DataFrame or Series
Stacked dataframe or series.

See Also
--------
DataFrame.unstack: unstack prescribed level(s) from index axis
onto column axis.
DataFrame.pivot: reshape dataframe from long format to wide
format.
DataFrame.pivot_table: create a spreadsheet-style pivot table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a space between the colon, and the description should start with a capital letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 652f7b2

as a DataFrame.
"""
from pandas.core.reshape.reshape import stack, stack_multiple

Expand Down