Description
It would make Pandas easier to teach, easier to learn, and easier to use if the sorting behavior were the same between series and dataframes. But the existing order()
and sort()
methods are locked into their old behaviors by all of the code that already depends on them.
But a new sorted()
method could bring symmetry between series and dataframes for code written from now on:
Series.sorted() => same as existing Series.order()
DataFrame.sorted() => same as existing DataFrame.sort()
Having this new pair of methods with identical conventions, where possible, would solve several different problems that learners have with Pandas today:
- In Pandas, nearly all methods return a new object by default instead of doing modification in-place, but learners discover that
Series.sort()
is a special case. - In Python, a
sort()
method traditionally returnsNone
and does an in-place sort, but learners have to discover thatDataFrame.sort()
violates this convention in order to match the behavior of the rest of Pandas. - The new-object sorter for series objects is
Series.order()
which is very difficult to discover, as nothing else in the Python ecosystem is namedorder()
, and since one would normally expect anorder()
method to tell you the order (ascending? descending? none?) instead of imposing a new order. - The standard Python name for a sort that returns a new object is
sorted()
, per the universally loved Python built-in, but learners cannot transfer this knowledge to Pandas, where that concept exists but under the two different namesSeries.order()
andDataFrame.sort()
.
Yes, the ed
at the end of sorted()
would be one character longer than order()
and two characters longer than the current practice of df.sort()
. But, on balance, I think that most programmers would happily cede two characters in order to be able to use the same method name when they are flipping code between handling series and handling dataframes, and happy to have the option of using the standard Python name for the concept of a non-in-place sort.
I suspect that deprecating the old names would be overly disruptive at this point, and they could probably live alongside the new sorted()
methods without much trouble — new documentation could adopt the new, consistent terminology where possible, if the Pandas developers did not want to disrupt current users of the old inconsistent names.