Skip to content

API: unify sort API #5190

Closed
Closed
@jreback

Description

@jreback

related is #2094
related is #6847 (fixes kind and some arg ordering)
related is #7121 (make sortlevel a part of sort_index by adding level arg)

the sorting API is currently inconsistent and confusing. here is what exists:

Series:

  • sort: calls Series.order, in-place, defaults quicksort
  • order: do the sort on values, return a new object, defaults mergesort
  • sort_index: sort by labels, returns new object

Frame:

  • sort: calls sort_index
  • sort_index: sorts by the index with no args, otherwise a nested sort of the passed columns

The semantics are different between Series and DataFrame. In Series, sort mean in-place, order returns a new object. sort/order sort on the values, while sort_index sorts on the index. For a DataFrame, sort and sort_index are the same and sort on a column/list of columns; inplace is a keyword.

Proposed signature of combined methods. We need to break a Series API here. because sort is an in-place method which is quite inconsistent with everything else.

def sort(self, by=None, axis=0, level=None, ascending=True, inplace=False,
                   kind='quicksort', na_last=True):

This is what I think we should do:

  • make Series.sort/order be the same.
  • by can take a column/list of columns (as it can now), or an index name / index to provide index sorting (which means sort by the specifiied axis)
  • default is inplace=False (which is the same as now, except for Series.sort).
  • Series.sort_index does s.sort('index')
  • DataFrame.sort_index does df.sort('index')
  • eventually deprecate Series.order
  • add DataFrame.sort_columns to perform axis=1 sorting

This does switch the argument to the current sort_index, (e.g. axis is currently first), but I think then allows more natural syntax

  • df.sort() or df.sort_index() or df.sort_index('index') sort on the index labels
  • df.sort(['A','B'],axis=1) sort on these columns (allow 'index' here as well to sort on the index too)
  • df.sort_columns() or df.sort('columns') sort on the column labels
  • df.sort_columns() defaults axis=1, so df.sort_columns(['A','B']) is equiv of - - df.sort(['A','B'],axis=1)
  • s.sort() sort on the values
  • s.sort('index') or s.sort_index() sort on the series index

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions