TST: Clean up tests of DataFrame.sort_{index,values} #13496

IamJeffG · 2016-06-22T07:15:59Z

Before taking a stab at #10806, I looked at the relevant tests and they didn't make much sense to me.

This commit fixes:

Some expected results were obtained by running code identical to what's being tested. (i.e. test could never fail)
Removed duplicate assertions testing the same functionality both in test_sort_index and in test_sort_values.
Switched the names test_sort_index and test_sort_values based on the (remaining) assertions being tested in their bodies.

Still confusing, unfixed:

The DataFrame.sort_index docstring doesn't specify what happens when level is unspecified and sort_remaining=False. However there is a legacy test using this case: it's not clear to me what result or behavior is being tested.

First time pandas contributor, so I'd appreciate a good critical review!

tests pass
passes git diff upstream/master | flake8 --diff
ensure default options in docstrings

codecov-io · 2016-06-22T08:45:30Z

Current coverage is 84.33%

Merging #13496 into master will increase coverage by <.01%

@@             master     #13496   diff @@
==========================================
  Files           138        138          
  Lines         51106      51100     -6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43102      43097     -5   
+ Misses         8004       8003     -1   
  Partials          0          0

Powered by Codecov. Last updated by 1a12ead...a333162

jreback · 2016-06-22T10:10:35Z

ok, pls factor out tests/series/tests_sorting.py while you are at it. (in test_analytics).
Ideally make sure we have test coverage for all of the options for sort_values/index.

I think these tests were this way originally because before 0.17.0 sort did both sort index/values in some cases.

looks good.

jorisvandenbossche · 2016-06-23T09:47:22Z

The DataFrame.sort_index docstring doesn't specify what happens when level is unspecified and sort_remaining=False. However there is a legacy test using this case: it's not clear to me what result or behavior is being tested.

I think should do nothing. When you do not specify level, I think this means that you want to sort by all levels, and in this way sort_remaining (either true or false) has no effect anymore.

I think this is also what the test does? (the same result for both sort_remaining is True or False)

jreback · 2016-06-23T10:06:54Z

yes, the doc-string should have the default for sort_remaining and pls check the other params, a stock string should always list the default (even if None)

IamJeffG · 2016-06-23T10:19:56Z

@jorisvandenbossche +1 to your interpretation. I might clarify the code comment in the test to explain.

@jreback Sure, I'll fix up the docstrings and also create tests/series/tests_sorting.py from test_analytics.

Before this commit, the `Series.sort_values()` tests relied on deprecated `Series.sort()` and `Series.order()` as the source of truth. However they both merely called `Series.sort_values()` under the hood. This commit consolidates the core test logic against `.sort_values()` directly, while `.sort()` and `.order()` merely check for equivalence with `.sort_values()`. Also removes some no-op assertions that had rotted from the old days of `sort()`/`order()`.

IamJeffG · 2016-06-23T18:59:09Z

I could use guidance on a few small details:

Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?
Is a whatsnew entry necessary for this change, which merely refactors tests without touching the core source?
The sort_values docstring is shared (core/generic.py) and therefore includes parameter "by" even for Series where it's not accepted. What's the right way to handle this discrepancy? Is it necessary to "un-share" the docstring?

jreback · 2016-06-24T10:07:30Z

Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?

yes, we do this for most other things, when you are reading a doc-string, its annoying to refer back to the signature. I am not sure we are very consistent in this though. @jorisvandenbossche

Is a whatsnew entry necessary for this change, which merely refactors tests without touching the core source?

In this case no, if its a 'big' test refactor then we usually want to note it

The sort_values docstring is shared (core/generic.py) and therefore includes parameter "by" even for Series where it's not accepted. What's the right way to handle this discrepancy? Is it necessary to "un-share" the docstring?

Right way is to share, but add in by via string replacement specifically for DataFrame (to avoid repeating things more than necessary)

jorisvandenbossche · 2016-06-24T11:41:04Z

Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?

yes, we do this for most other things, when you are reading a doc-string, its annoying to refer back to the signature. I am not sure we are very consistent in this though. @jorisvandenbossche

Well, the question if how you define 'optional'. In principle are almost all keyword arguments optional (in the sense that you do not have to pass them explicitly), but in many cases they have a meaningful default value. And then it is indeed informative to note that as @jreback says.

But I would say, just make a change that you think is sensible, and we can always comment :-)

IamJeffG · 2016-06-28T20:08:04Z

add in by via string replacement specifically for DataFrame

@jreback Do you mean like this? a12cea5

(However now I'm wondering if "by" will after all become used for Series, axis=1, per #10806.)

Besides by, also Series.sort_index does not accept the kwargs kind and na_position. Until they get implemented I feel I should do the same sort of string replacement, although this begins to get unweildy.

jorisvandenbossche · 2016-06-28T22:43:57Z

Do you mean like this?

Yes, that looks good

(However now I'm wondering if "by" will after all become used for Series, axis=1, per #10806.)

No, I think that will also only be for dataframes (Series has no axis=1)

Besides by, also Series.sort_index does not accept the kwargs kind and na_position. Until they get implemented I feel I should do the same sort of string replacement, although this begins to get unweildy.

Indeed, you can change that as well.
But, it would not be too difficult to add those parameters to Series.sort_index (it's just changing the numpy argsort call with the internal wrapper _nargsort which adds those additional keywords. But if you want to do this, maybe keep that for a separate PR.

jreback · 2016-07-05T10:43:20Z

ping when ready with this

Duplicates values in the test DataFrame are necessary to fully test this feature.

IamJeffG · 2016-07-08T19:28:47Z

Series.sort_index does not accept the kwargs kind and na_position

Opened #13589 for these missing features.

make sure we have test coverage for all of the options for sort_values/index

My report on test coverage:

sort_values's tests for multiple by's were naive. I've improved this by adding duplicate values to the DataFrame being tested.
sort_index was missing a test for multi-level Series.sort_index(sort_remaining). I've added it.

I think this is about ready for merge, @jreback

jreback · 2016-07-09T19:58:06Z

pandas/tests/frame/test_sorting.py

-        # API for 9816
-
-        # sort_index
+    def test_sort_index(self):
        frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4],


add this issue number (the PR number) as a comment (so future readers can reference)

jreback · 2016-07-09T20:00:22Z

very minor comments. pls post the new doc-strings, and ping on green.

jreback · 2016-07-10T22:32:46Z

@IamJeffG this looks good, except the doc-string on DataFrame.sort_values is looking a bit funny. The by is not indented properly.

IamJeffG · 2016-07-11T12:12:22Z

@jreback see if my newest commit fixes this. I could verify the indentation problem using string replacement in the REPL, but I was unable to get docs to successfully build, to verify my fix end-to-end.

jorisvandenbossche · 2016-07-11T12:29:43Z

@IamJeffG Looks good now! (BTW, you can also just look at the docstring in the interactive console (pd.DataFrame.sort_values? in ipython) to check that it looks OK)

IamJeffG · 2016-07-11T12:45:25Z

Thanks for the good tip @jorisvandenbossche!

IamJeffG · 2016-07-11T18:18:18Z

@jreback We're green.

jorisvandenbossche · 2016-07-11T18:21:20Z

@IamJeffG Thanks a lot!

jorisvandenbossche · 2016-07-11T18:22:14Z

Interested in taking on #13589 as well?

IamJeffG · 2016-07-11T18:47:08Z

Perhaps. Let me start with #10806, which was my initial goal.

* TST: Clean up tests of DataFrame.sort_{index,values} * Factor out Series sorting tests to own file. * Delegate deprecated sort() and order() to their own tests. Before this commit, the `Series.sort_values()` tests relied on deprecated `Series.sort()` and `Series.order()` as the source of truth. However they both merely called `Series.sort_values()` under the hood. This commit consolidates the core test logic against `.sort_values()` directly, while `.sort()` and `.order()` merely check for equivalence with `.sort_values()`. Also removes some no-op assertions that had rotted from the old days of `sort()`/`order()`. * Remove 'by' docstring from Series.sort_values * Document defaults for optional sorting args * Move more sort_values, sort_index tests to be together. * Add test for Series.sort_index(sort_remaining=True) * Improve `sort_values` tests when multiple `by`s Duplicates values in the test DataFrame are necessary to fully test this feature. * PEP8 cleanup * Annotate tests with GH issue * Fix indentation - docstring string replacement

TST: Clean up tests of DataFrame.sort_{index,values}

8e9fb62

jreback added Testing pandas testing functions or related to the test suite Clean labels Jun 22, 2016

jreback added this to the 0.18.2 milestone Jun 22, 2016

IamJeffG added 2 commits June 23, 2016 20:52

Factor out Series sorting tests to own file.

70e06d1

IamJeffG added 2 commits June 28, 2016 10:40

Remove 'by' docstring from Series.sort_values

a12cea5

Document defaults for optional sorting args

600e2de

IamJeffG added 3 commits July 7, 2016 19:15

Move more sort_values, sort_index tests to be together.

698399d

Add test for Series.sort_index(sort_remaining=True)

f0b6127

Improve sort_values tests when multiple bys

44dd706

Duplicates values in the test DataFrame are necessary to fully test this feature.

PEP8 cleanup

5988277

jreback reviewed Jul 9, 2016
View reviewed changes

Annotate tests with GH issue

f761972

Fix indentation - docstring string replacement

a333162

jorisvandenbossche merged commit 65849d3 into pandas-dev:master Jul 11, 2016

jorisvandenbossche mentioned this pull request Oct 31, 2017

Series.sort_values doesn't support a by keyword argument, but the documentation claims it does #11540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Clean up tests of DataFrame.sort_{index,values} #13496

TST: Clean up tests of DataFrame.sort_{index,values} #13496

IamJeffG commented Jun 22, 2016 •

edited

Loading

codecov-io commented Jun 22, 2016 •

edited

Loading

jreback commented Jun 22, 2016

jorisvandenbossche commented Jun 23, 2016

jreback commented Jun 23, 2016

IamJeffG commented Jun 23, 2016 •

edited

Loading

IamJeffG commented Jun 23, 2016

jreback commented Jun 24, 2016

jorisvandenbossche commented Jun 24, 2016 •

edited

Loading

IamJeffG commented Jun 28, 2016

jorisvandenbossche commented Jun 28, 2016

jreback commented Jul 5, 2016

IamJeffG commented Jul 8, 2016

jreback Jul 9, 2016 •

edited

Loading

jreback commented Jul 9, 2016

jreback commented Jul 10, 2016

IamJeffG commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

TST: Clean up tests of DataFrame.sort_{index,values} #13496

TST: Clean up tests of DataFrame.sort_{index,values} #13496

Conversation

IamJeffG commented Jun 22, 2016 • edited Loading

codecov-io commented Jun 22, 2016 • edited Loading

Current coverage is 84.33%

jreback commented Jun 22, 2016

jorisvandenbossche commented Jun 23, 2016

jreback commented Jun 23, 2016

IamJeffG commented Jun 23, 2016 • edited Loading

IamJeffG commented Jun 23, 2016

jreback commented Jun 24, 2016

jorisvandenbossche commented Jun 24, 2016 • edited Loading

IamJeffG commented Jun 28, 2016

jorisvandenbossche commented Jun 28, 2016

jreback commented Jul 5, 2016

IamJeffG commented Jul 8, 2016

jreback Jul 9, 2016 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 9, 2016

jreback commented Jul 10, 2016

IamJeffG commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

jorisvandenbossche commented Jul 11, 2016

IamJeffG commented Jul 11, 2016

IamJeffG commented Jun 22, 2016 •

edited

Loading

codecov-io commented Jun 22, 2016 •

edited

Loading

IamJeffG commented Jun 23, 2016 •

edited

Loading

jorisvandenbossche commented Jun 24, 2016 •

edited

Loading

jreback Jul 9, 2016 •

edited

Loading