-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix order of diamond color factor. #3146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix order of diamond color factor. #3146
Conversation
The factor order of the diamonds dataset is currently `D < ... < J`, but since "D" is the best grade, it should be `J < ... < D`. https://en.wikipedia.org/wiki/Diamond_color#Grading_fancy_color_diamonds
Since the levels of colors were in the incorrect order (see tidyverse/ggplot2#3146), the interpretation of the relationship between color and price was backwards. Fixes #199
This PR has the potential to mess up a lot of existing examples and tutorials, so we should only merge if we're absolutely certain that this change is required. And I'm not entirely convinced. For rankings of quality, you can usually rank from worst to best or from best to worst. For example, if you submit a grant proposal to the National Institutes of Health, it will be ranked on a scale from 1 to 9, with 1 being exceptional and 9 terrible. The current library(tidyverse)
diamonds %>%
group_by(color) %>%
tally() %>%
arrange(color)
#> # A tibble: 7 x 2
#> color n
#> <ord> <int>
#> 1 D 6775
#> 2 E 9797
#> 3 F 9542
#> 4 G 11292
#> 5 H 8304
#> 6 I 5422
#> 7 J 2808 Created on 2019-02-16 by the reprex package (v0.2.1) |
Agreed. As I commented on #2962, it seems almost impossible to safely deprecate and update a dataset. But, from this comment on the original issue (jrnold/r4ds-exercise-solutions#199), I feel this is just a documentation problem.
Should line this be Line 11 in 03bd946
|
I think that the argument about breaking existing examples is key. So I'm going to close it. Changing that line in the documentation may help. Yes, quality can be ordered from best to worst or worst to best. But the ordering of
|
@jrnold Would you be willing to submit a PR that improves the documentation? |
Put the order that the levels of `diamonds$color` are mentioned in the docs match their order in the variable. See tidyverse#3146 (comment)
Removed the commit with changes to the |
Thanks, could you update the docs by running |
No worries, I'll do this. |
Thanks! |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
The current
color
variable in thediamonds
dataset is an ordered factor, but the levels are in decreasing order of color quality rather than increasing order.Created on 2019-02-15 by the reprex package (v0.2.1)
The factor order of
diamonds$color
is currentlyD < ... < J
. However, since "D" is the best grade and "J" the words, the levels should beJ < ... < D
. See https://en.wikipedia.org/wiki/Diamond_color#Grading_fancy_color_diamonds.This pull request puts the factor levels in the correct order.