-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Allow keep="all" in duplicated method (#23251) #23252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @svenharris! Thanks for submitting the PR.
|
duplicated
method (#23251)
duplicated
method (#23251)@@ -145,8 +145,8 @@ def duplicated_{{dtype}}({{scalar}}[:] values, object keep='first'): | |||
|
|||
kh_resize_{{ttype}}(table, min(n, _SIZE_HINT_LIMIT)) | |||
|
|||
if keep not in ('last', 'first', False): | |||
raise ValueError('keep must be either "first", "last" or False') | |||
if keep not in ('last', 'first', 'all', False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the RHS of this statement to be a set instead of a tuple? Also can you check if we are testing this raises somewhere already?
@@ -1252,6 +1252,9 @@ def drop_duplicates(self, keep='first', inplace=False): | |||
if self.is_unique: | |||
return self._shallow_copy() | |||
|
|||
if keep not in ['first', 'last', False]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comment in the OP but having a hard time understanding here - what's the reason you need to add these checks and cannot include all
as part of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is breaking the symmetry between drop_duplicates
and duplicated
. Since False
has the opposite effect in drop_duplicates
(drop all dups) and duplicated
(keep all dups). So the 'all'
keyword has been blocked from the drop_duplicates
because it wouldn't do what you expect. Lower down in the hashtable helpers the functions can then keep their symmetry if all is blocked at the top level, which is what this piece of code is supposed to do.
To differentiate, could raise a specific error relating to keep='all'
. That probably be clearer as to why the statement was there.
closing as stale. if you'd like to continue, pls ping. |
Will come back to this if I (or anyone else) thinks of a good idea for the design but there wasn't consensus in the discussion what the change should look like |
git diff upstream/master -u -- "*.py" | flake8 --diff
Unfortunately this didn't end up being as clean as I had hoped (due to preventing keep="all" being allowed when calling implicitly through drop_duplicates. So put in a bit of duplication of error handling. Happy to take suggestions if there's a better approach.