API: unify sort API

related is #2094
related is #6847 (fixes kind and some arg ordering)
related is #7121 (make `sortlevel` a part of `sort_index` by adding level arg)

the sorting API is currently inconsistent and confusing. here is what exists:

Series:
- `sort`: calls `Series.order`, in-place, defaults `quicksort`
- `order`: do the sort on values, return a new object, defaults `mergesort`
- `sort_index`: sort by labels, returns new object

Frame:
- `sort`: calls `sort_index`
- `sort_index`: sorts by the index with no args, otherwise a nested sort of the passed columns

The semantics are different between `Series` and `DataFrame`. In `Series`, `sort` mean in-place, `order` returns a new object. `sort/order` sort on the values, while `sort_index` sorts on the index. For a `DataFrame`, `sort` and `sort_index` are the same and sort on a column/list of columns; `inplace` is a keyword.

Proposed signature of combined methods. We need to break a `Series` API here. because `sort` is an in-place method which is quite inconsistent with everything else.

```
def sort(self, by=None, axis=0, level=None, ascending=True, inplace=False,
                   kind='quicksort', na_last=True):
```

This is what I think we should do:
- make `Series.sort/order` be the same.
- by can take a column/list of columns (as it can now), or an index name / `index` to provide index sorting (which means sort by the specifiied axis)
- default is `inplace=False` (which is the same as now, except for `Series.sort`).
- `Series.sort_index` does `s.sort('index')`
- `DataFrame.sort_index` does `df.sort('index')`
- eventually deprecate `Series.order`
- add `DataFrame.sort_columns` to perform axis=1 sorting

This does switch the argument to the current `sort_index`, (e.g. axis is currently first), but I think then allows more natural syntax
- `df.sort()` or `df.sort_index()` or `df.sort_index('index')` sort on the index labels
- `df.sort(['A','B'],axis=1)` sort on these columns (allow 'index' here as well to sort on the index too)
- `df.sort_columns()` or `df.sort('columns')` sort on the column labels
- `df.sort_columns()` defaults `axis=1`, so `df.sort_columns(['A','B'])` is equiv of - - `df.sort(['A','B'],axis=1)`
- `s.sort()` sort on the values
- `s.sort('index')` or `s.sort_index()` sort on the series index


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: unify sort API #5190

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: unify sort API #5190

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions