Skip to content

Implicit alignment in operations #12

Open
@TomAugspurger

Description

@TomAugspurger

In #2 there seems to be some agreement that row-labels are an important component of a dataframe. Pandas takes this a step further by using them for alignment in many operations involving multiple dataframes.

In [10]: a = pd.DataFrame({"A": [1, 2, 3]}, index=['a', 'b', 'c'])

In [11]: b = pd.DataFrame({"A": [2, 3, 1]}, index=['b', 'c', 'a'])

In [12]: a
Out[12]:
   A
a  1
b  2
c  3

In [13]: b
Out[13]:
   A
b  2
c  3
a  1

In [14]: a + b
Out[14]:
   A
a  2
b  4
c  6

In the background there's an implicit a.align(b), which reindexes the dataframes to a common index. The resulting index will be the union of the two indices.

A few other places this occurs

  • Indexing a DataFrame / Series with an integer or boolean series
  • pd.concat
  • DataFrame constructor

Do we want to adopt this behavior for the standard?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions