Skip to content

TST: Clean up tests of DataFrame.sort_{index,values} #13496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 11, 2016

Conversation

IamJeffG
Copy link
Contributor

@IamJeffG IamJeffG commented Jun 22, 2016

Before taking a stab at #10806, I looked at the relevant tests and they didn't make much sense to me.

This commit fixes:

  • Some expected results were obtained by running code identical to what's being tested. (i.e. test could never fail)
  • Removed duplicate assertions testing the same functionality both in test_sort_index and in test_sort_values.
  • Switched the names test_sort_index and test_sort_values based on the (remaining) assertions being tested in their bodies.

Still confusing, unfixed:

  • The DataFrame.sort_index docstring doesn't specify what happens when level is unspecified and sort_remaining=False. However there is a legacy test using this case: it's not clear to me what result or behavior is being tested.

First time pandas contributor, so I'd appreciate a good critical review!

  • tests pass
  • passes git diff upstream/master | flake8 --diff
  • ensure default options in docstrings

@codecov-io
Copy link

codecov-io commented Jun 22, 2016

Current coverage is 84.33%

Merging #13496 into master will increase coverage by <.01%

@@             master     #13496   diff @@
==========================================
  Files           138        138          
  Lines         51106      51100     -6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43102      43097     -5   
+ Misses         8004       8003     -1   
  Partials          0          0          

Powered by Codecov. Last updated by 1a12ead...a333162

@jreback
Copy link
Contributor

jreback commented Jun 22, 2016

ok, pls factor out tests/series/tests_sorting.py while you are at it. (in test_analytics).
Ideally make sure we have test coverage for all of the options for sort_values/index.

I think these tests were this way originally because before 0.17.0 sort did both sort index/values in some cases.

looks good.

@jreback jreback added Testing pandas testing functions or related to the test suite Clean labels Jun 22, 2016
@jreback jreback added this to the 0.18.2 milestone Jun 22, 2016
@jorisvandenbossche
Copy link
Member

The DataFrame.sort_index docstring doesn't specify what happens when level is unspecified and sort_remaining=False. However there is a legacy test using this case: it's not clear to me what result or behavior is being tested.

I think should do nothing. When you do not specify level, I think this means that you want to sort by all levels, and in this way sort_remaining (either true or false) has no effect anymore.

I think this is also what the test does? (the same result for both sort_remaining is True or False)

@jreback
Copy link
Contributor

jreback commented Jun 23, 2016

yes, the doc-string should have the default for sort_remaining and pls check the other params, a stock string should always list the default (even if None)

@IamJeffG
Copy link
Contributor Author

IamJeffG commented Jun 23, 2016

@jorisvandenbossche +1 to your interpretation. I might clarify the code comment in the test to explain.

@jreback Sure, I'll fix up the docstrings and also create tests/series/tests_sorting.py from test_analytics.

IamJeffG added 2 commits June 23, 2016 20:52
Before this commit, the `Series.sort_values()` tests relied on deprecated
`Series.sort()` and `Series.order()` as the source of truth. However
they both merely called `Series.sort_values()` under the hood.

This commit consolidates the core test logic against `.sort_values()`
directly, while `.sort()` and `.order()` merely check for equivalence
with `.sort_values()`.

Also removes some no-op assertions that had rotted from the old days of
`sort()`/`order()`.
@IamJeffG
Copy link
Contributor Author

I could use guidance on a few small details:

  1. Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?
  2. Is a whatsnew entry necessary for this change, which merely refactors tests without touching the core source?
  3. The sort_values docstring is shared (core/generic.py) and therefore includes parameter "by" even for Series where it's not accepted. What's the right way to handle this discrepancy? Is it necessary to "un-share" the docstring?

@jreback
Copy link
Contributor

jreback commented Jun 24, 2016

Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?

yes, we do this for most other things, when you are reading a doc-string, its annoying to refer back to the signature. I am not sure we are very consistent in this though. @jorisvandenbossche

Is a whatsnew entry necessary for this change, which merely refactors tests without touching the core source?

In this case no, if its a 'big' test refactor then we usually want to note it

The sort_values docstring is shared (core/generic.py) and therefore includes parameter "by" even for Series where it's not accepted. What's the right way to handle this discrepancy? Is it necessary to "un-share" the docstring?

Right way is to share, but add in by via string replacement specifically for DataFrame (to avoid repeating things more than necessary)

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 24, 2016

Numpy Docstring Standard says just to mark optional kwargs as "optional", not to re-state their default value in the docstring. @jreback do you really want the defaults listed?

yes, we do this for most other things, when you are reading a doc-string, its annoying to refer back to the signature. I am not sure we are very consistent in this though. @jorisvandenbossche

Well, the question if how you define 'optional'. In principle are almost all keyword arguments optional (in the sense that you do not have to pass them explicitly), but in many cases they have a meaningful default value. And then it is indeed informative to note that as @jreback says.

But I would say, just make a change that you think is sensible, and we can always comment :-)

@IamJeffG
Copy link
Contributor Author

add in by via string replacement specifically for DataFrame

@jreback Do you mean like this? a12cea5

(However now I'm wondering if "by" will after all become used for Series, axis=1, per #10806.)

Besides by, also Series.sort_index does not accept the kwargs kind and na_position. Until they get implemented I feel I should do the same sort of string replacement, although this begins to get unweildy.

@jorisvandenbossche
Copy link
Member

Do you mean like this?

Yes, that looks good

(However now I'm wondering if "by" will after all become used for Series, axis=1, per #10806.)

No, I think that will also only be for dataframes (Series has no axis=1)

Besides by, also Series.sort_index does not accept the kwargs kind and na_position. Until they get implemented I feel I should do the same sort of string replacement, although this begins to get unweildy.

Indeed, you can change that as well.
But, it would not be too difficult to add those parameters to Series.sort_index (it's just changing the numpy argsort call with the internal wrapper _nargsort which adds those additional keywords. But if you want to do this, maybe keep that for a separate PR.

@jreback
Copy link
Contributor

jreback commented Jul 5, 2016

ping when ready with this

@IamJeffG
Copy link
Contributor Author

IamJeffG commented Jul 8, 2016

Series.sort_index does not accept the kwargs kind and na_position

Opened #13589 for these missing features.

make sure we have test coverage for all of the options for sort_values/index

My report on test coverage:

  • sort_values's tests for multiple by's were naive. I've improved this by adding duplicate values to the DataFrame being tested.
  • sort_index was missing a test for multi-level Series.sort_index(sort_remaining). I've added it.

I think this is about ready for merge, @jreback

# API for 9816

# sort_index
def test_sort_index(self):
frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4],
Copy link
Contributor

@jreback jreback Jul 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this issue number (the PR number) as a comment (so future readers can reference)

@jreback
Copy link
Contributor

jreback commented Jul 9, 2016

very minor comments. pls post the new doc-strings, and ping on green.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2016

@IamJeffG this looks good, except the doc-string on DataFrame.sort_values is looking a bit funny. The by is not indented properly.

@IamJeffG
Copy link
Contributor Author

@jreback see if my newest commit fixes this. I could verify the indentation problem using string replacement in the REPL, but I was unable to get docs to successfully build, to verify my fix end-to-end.

@jorisvandenbossche
Copy link
Member

@IamJeffG Looks good now! (BTW, you can also just look at the docstring in the interactive console (pd.DataFrame.sort_values? in ipython) to check that it looks OK)

@IamJeffG
Copy link
Contributor Author

Thanks for the good tip @jorisvandenbossche!

@IamJeffG
Copy link
Contributor Author

@jreback We're green.

@jorisvandenbossche jorisvandenbossche merged commit 65849d3 into pandas-dev:master Jul 11, 2016
@jorisvandenbossche
Copy link
Member

@IamJeffG Thanks a lot!

@jorisvandenbossche
Copy link
Member

Interested in taking on #13589 as well?

@IamJeffG
Copy link
Contributor Author

Perhaps. Let me start with #10806, which was my initial goal.

nateGeorge pushed a commit to nateGeorge/pandas that referenced this pull request Aug 15, 2016
* TST: Clean up tests of DataFrame.sort_{index,values}

* Factor out Series sorting tests to own file.

* Delegate deprecated sort() and order() to their own tests.

Before this commit, the `Series.sort_values()` tests relied on deprecated
`Series.sort()` and `Series.order()` as the source of truth. However
they both merely called `Series.sort_values()` under the hood.

This commit consolidates the core test logic against `.sort_values()`
directly, while `.sort()` and `.order()` merely check for equivalence
with `.sort_values()`.

Also removes some no-op assertions that had rotted from the old days of
`sort()`/`order()`.

* Remove 'by' docstring from Series.sort_values

* Document defaults for optional sorting args

* Move more sort_values, sort_index tests to be together.

* Add test for Series.sort_index(sort_remaining=True)

* Improve `sort_values` tests when multiple `by`s

Duplicates values in the test DataFrame are necessary
to fully test this feature.

* PEP8 cleanup

* Annotate tests with GH issue

* Fix indentation - docstring string replacement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants