Description
related is #2094
related is #6847 (fixes kind and some arg ordering)
related is #7121 (make sortlevel
a part of sort_index
by adding level arg)
the sorting API is currently inconsistent and confusing. here is what exists:
Series:
sort
: callsSeries.order
, in-place, defaultsquicksort
order
: do the sort on values, return a new object, defaultsmergesort
sort_index
: sort by labels, returns new object
Frame:
sort
: callssort_index
sort_index
: sorts by the index with no args, otherwise a nested sort of the passed columns
The semantics are different between Series
and DataFrame
. In Series
, sort
mean in-place, order
returns a new object. sort/order
sort on the values, while sort_index
sorts on the index. For a DataFrame
, sort
and sort_index
are the same and sort on a column/list of columns; inplace
is a keyword.
Proposed signature of combined methods. We need to break a Series
API here. because sort
is an in-place method which is quite inconsistent with everything else.
def sort(self, by=None, axis=0, level=None, ascending=True, inplace=False,
kind='quicksort', na_last=True):
This is what I think we should do:
- make
Series.sort/order
be the same. - by can take a column/list of columns (as it can now), or an index name /
index
to provide index sorting (which means sort by the specifiied axis) - default is
inplace=False
(which is the same as now, except forSeries.sort
). Series.sort_index
doess.sort('index')
DataFrame.sort_index
doesdf.sort('index')
- eventually deprecate
Series.order
- add
DataFrame.sort_columns
to perform axis=1 sorting
This does switch the argument to the current sort_index
, (e.g. axis is currently first), but I think then allows more natural syntax
df.sort()
ordf.sort_index()
ordf.sort_index('index')
sort on the index labelsdf.sort(['A','B'],axis=1)
sort on these columns (allow 'index' here as well to sort on the index too)df.sort_columns()
ordf.sort('columns')
sort on the column labelsdf.sort_columns()
defaultsaxis=1
, sodf.sort_columns(['A','B'])
is equiv of - -df.sort(['A','B'],axis=1)
s.sort()
sort on the valuess.sort('index')
ors.sort_index()
sort on the series index