-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: Clean up tests of DataFrame.sort_{index,values} #13496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Clean up tests of DataFrame.sort_{index,values} #13496
Conversation
Current coverage is 84.33%@@ master #13496 diff @@
==========================================
Files 138 138
Lines 51106 51100 -6
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43102 43097 -5
+ Misses 8004 8003 -1
Partials 0 0
|
ok, pls factor out I think these tests were this way originally because before 0.17.0 sort did both sort index/values in some cases. looks good. |
I think should do nothing. When you do not specify I think this is also what the test does? (the same result for both |
yes, the doc-string should have the default for |
@jorisvandenbossche +1 to your interpretation. I might clarify the code comment in the test to explain. @jreback Sure, I'll fix up the docstrings and also create |
Before this commit, the `Series.sort_values()` tests relied on deprecated `Series.sort()` and `Series.order()` as the source of truth. However they both merely called `Series.sort_values()` under the hood. This commit consolidates the core test logic against `.sort_values()` directly, while `.sort()` and `.order()` merely check for equivalence with `.sort_values()`. Also removes some no-op assertions that had rotted from the old days of `sort()`/`order()`.
I could use guidance on a few small details:
|
yes, we do this for most other things, when you are reading a doc-string, its annoying to refer back to the signature. I am not sure we are very consistent in this though. @jorisvandenbossche
In this case no, if its a 'big' test refactor then we usually want to note it
Right way is to share, but add in |
Well, the question if how you define 'optional'. In principle are almost all keyword arguments optional (in the sense that you do not have to pass them explicitly), but in many cases they have a meaningful default value. And then it is indeed informative to note that as @jreback says. But I would say, just make a change that you think is sensible, and we can always comment :-) |
@jreback Do you mean like this? a12cea5 (However now I'm wondering if "by" will after all become used for Series, Besides |
Yes, that looks good
No, I think that will also only be for dataframes (Series has no axis=1)
Indeed, you can change that as well. |
ping when ready with this |
Duplicates values in the test DataFrame are necessary to fully test this feature.
Opened #13589 for these missing features.
My report on test coverage:
I think this is about ready for merge, @jreback |
# API for 9816 | ||
|
||
# sort_index | ||
def test_sort_index(self): | ||
frame = DataFrame(np.arange(16).reshape(4, 4), index=[1, 2, 3, 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this issue number (the PR number) as a comment (so future readers can reference)
very minor comments. pls post the new doc-strings, and ping on green. |
@IamJeffG this looks good, except the doc-string on DataFrame.sort_values is looking a bit funny. The by is not indented properly. |
@jreback see if my newest commit fixes this. I could verify the indentation problem using string replacement in the REPL, but I was unable to get docs to successfully build, to verify my fix end-to-end. |
@IamJeffG Looks good now! (BTW, you can also just look at the docstring in the interactive console ( |
Thanks for the good tip @jorisvandenbossche! |
@jreback We're green. |
@IamJeffG Thanks a lot! |
Interested in taking on #13589 as well? |
Perhaps. Let me start with #10806, which was my initial goal. |
* TST: Clean up tests of DataFrame.sort_{index,values} * Factor out Series sorting tests to own file. * Delegate deprecated sort() and order() to their own tests. Before this commit, the `Series.sort_values()` tests relied on deprecated `Series.sort()` and `Series.order()` as the source of truth. However they both merely called `Series.sort_values()` under the hood. This commit consolidates the core test logic against `.sort_values()` directly, while `.sort()` and `.order()` merely check for equivalence with `.sort_values()`. Also removes some no-op assertions that had rotted from the old days of `sort()`/`order()`. * Remove 'by' docstring from Series.sort_values * Document defaults for optional sorting args * Move more sort_values, sort_index tests to be together. * Add test for Series.sort_index(sort_remaining=True) * Improve `sort_values` tests when multiple `by`s Duplicates values in the test DataFrame are necessary to fully test this feature. * PEP8 cleanup * Annotate tests with GH issue * Fix indentation - docstring string replacement
Before taking a stab at #10806, I looked at the relevant tests and they didn't make much sense to me.
This commit fixes:
test_sort_index
and intest_sort_values
.test_sort_index
andtest_sort_values
based on the (remaining) assertions being tested in their bodies.Still confusing, unfixed:
DataFrame.sort_index
docstring doesn't specify what happens whenlevel
is unspecified andsort_remaining=False
. However there is a legacy test using this case: it's not clear to me what result or behavior is being tested.First time pandas contributor, so I'd appreciate a good critical review!
git diff upstream/master | flake8 --diff