Skip to content

DOC: update the DataFrame.stack docstring #20430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a73bd9a
Fix docstring or pandas.DataFrame.stack.
samuelsinayoko Mar 20, 2018
1771437
Polish the docstring (plural issues and the like).
samuelsinayoko Mar 20, 2018
f17b52b
Add description to example.
samuelsinayoko Mar 20, 2018
c756141
Add an example with multi-level column.
samuelsinayoko Mar 20, 2018
4d09b85
Add more examples.
samuelsinayoko Mar 20, 2018
d5a262a
Fix sphinx docs
samuelsinayoko Mar 20, 2018
d3ef094
Fix parameter types
samuelsinayoko Mar 20, 2018
4d60246
Post review improvements.
samuelsinayoko Mar 20, 2018
16301d6
Start refactoring the examples.
samuelsinayoko Mar 20, 2018
310511d
Refactor examples
samuelsinayoko Mar 20, 2018
7e10273
Polish examples.
samuelsinayoko Mar 20, 2018
77c9fac
Add an example where multiple levels are stacked at once.
samuelsinayoko Mar 20, 2018
98a4a93
Clarify filling behaviour with missing values
samuelsinayoko Mar 20, 2018
41ad4cf
flake8
samuelsinayoko Mar 20, 2018
99734ac
Put Examples section at the end.
samuelsinayoko Mar 21, 2018
652f7b2
Fix 'See Also' section.
samuelsinayoko Mar 21, 2018
7f422d6
Create separate section for single level columns.
samuelsinayoko Mar 21, 2018
15902ed
Split the examples into several sections.
samuelsinayoko Mar 22, 2018
2379886
remove unwanted blank lines
samuelsinayoko Mar 22, 2018
2e0873b
Start using more meaningful index & column names
samuelsinayoko Mar 22, 2018
718f212
Use more meaningful column and index names.
samuelsinayoko Mar 22, 2018
747d245
Shorten overly long lines in examples.
samuelsinayoko Mar 25, 2018
a2c9b1a
Shorter one line description.
samuelsinayoko Mar 25, 2018
d34732d
Better description in the notes section.
samuelsinayoko Mar 25, 2018
5bc794c
Formatting [ci skip]
TomAugspurger Mar 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 153 additions & 23 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5145,36 +5145,166 @@ def pivot_table(self, values=None, index=None, columns=None,

def stack(self, level=-1, dropna=True):
"""
Pivot a level of the (possibly hierarchical) column labels, returning a
DataFrame (or Series in the case of an object with a single level of
column labels) having a hierarchical index with a new inner-most level
of row labels.
The level involved will automatically get sorted.
Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level
index with one or more new inner-most levels compared to the current
DataFrame. The new inner-most levels are created by pivoting the
columns of the current dataframe:

- if the columns have a single level, the output is a Series;
- if the columns have multiple levels, the new index
level(s) is (are) taken from the prescribed level(s) and
the output is a DataFrame.

The new index levels are sorted.

Parameters
----------
level : int, string, or list of these, default last level
Level(s) to stack, can pass level name
dropna : boolean, default True
Whether to drop rows in the resulting Frame/Series with no valid
values
level : int, str, list, default -1
Level(s) to stack from the column axis onto the index
axis, defined as one index or label, or a list of indices
or labels.
dropna : bool, default True
Whether to drop rows in the resulting Frame/Series with
missing values. Stacking a column level onto the index
axis can create combinations of index and column values
that are missing from the original dataframe. See Examples
section.

Returns
-------
DataFrame or Series
Stacked dataframe or series.

See Also
--------
DataFrame.unstack : Unstack prescribed level(s) from index axis
onto column axis.
DataFrame.pivot : Reshape dataframe from long format to wide
format.
DataFrame.pivot_table : Create a spreadsheet-style pivot table
as a DataFrame.

Notes
-----
The function is named by analogy with a collection of books
being re-organised from being side by side on a horizontal
position (the columns of the dataframe) to being stacked
vertically on top of of each other (in the index of the
dataframe).

Examples
----------
>>> s
a b
one 1. 2.
two 3. 4.
--------
**Single level columns**

>>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
... index=['cat', 'dog'],
... columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

>>> df_single_level_cols
weight height
cat 0 1
dog 2 3
>>> df_single_level_cols.stack()
cat weight 0
height 1
dog weight 2
height 3
dtype: int64

>>> s.stack()
one a 1
b 2
two a 3
b 4
**Multi level columns: simple case**

>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
... index=['cat', 'dog'],
... columns=multicol1)

Stacking a dataframe with a multi-level column axis:

>>> df_multi_level_cols1
weight
kg pounds
cat 1 2
dog 2 4
>>> df_multi_level_cols1.stack()
weight
cat kg 1
pounds 2
dog kg 2
pounds 4

**Missing values**

>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
... index=['cat', 'dog'],
... columns=multicol2)

It is common to have missing values when stacking a dataframe
with multi-level columns, as the stacked dataframe typically
has more values than the original dataframe. Missing values
are filled with NaNs:

>>> df_multi_level_cols2
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
>>> df_multi_level_cols2.stack()
height weight
cat kg NaN 1.0
m 2.0 NaN
dog kg NaN 3.0
m 4.0 NaN

**Prescribing the level(s) to be stacked**

The first parameter controls which level or levels are stacked:

>>> df_multi_level_cols2.stack(0)
kg m
cat height NaN 2.0
weight 1.0 NaN
dog height NaN 4.0
weight 3.0 NaN
>>> df_multi_level_cols2.stack([0, 1])
cat height m 2.0
weight kg 1.0
dog height m 4.0
weight kg 3.0
dtype: float64

Returns
-------
stacked : DataFrame or Series
**Dropping missing values**

>>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
... index=['cat', 'dog'],
... columns=multicol2)

Note that rows where all values are missing are dropped by
default but this behaviour can be controlled via the dropna
keyword parameter:

>>> df_multi_level_cols3
weight height
kg m
cat NaN 1.0
dog 2.0 3.0
>>> df_multi_level_cols3.stack(dropna=False)
height weight
cat kg NaN NaN
m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
>>> df_multi_level_cols3.stack(dropna=True)
height weight
cat m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
"""
from pandas.core.reshape.reshape import stack, stack_multiple

Expand Down