Skip to content

ENH: Allow keep="all" in duplicated method (#23251) #23252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

svenharris
Copy link
Contributor

Unfortunately this didn't end up being as clean as I had hoped (due to preventing keep="all" being allowed when calling implicitly through drop_duplicates. So put in a bit of duplication of error handling. Happy to take suggestions if there's a better approach.

@pep8speaks
Copy link

Hello @svenharris! Thanks for submitting the PR.

@svenharris svenharris changed the title Duplicated keep all ENH: Allow keep="all" in duplicated method (#23251) Oct 20, 2018
@svenharris svenharris changed the title ENH: Allow keep="all" in duplicated method (#23251) ENH: Allow keep="all" in duplicated method (#23251) Oct 20, 2018
@@ -145,8 +145,8 @@ def duplicated_{{dtype}}({{scalar}}[:] values, object keep='first'):

kh_resize_{{ttype}}(table, min(n, _SIZE_HINT_LIMIT))

if keep not in ('last', 'first', False):
raise ValueError('keep must be either "first", "last" or False')
if keep not in ('last', 'first', 'all', False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the RHS of this statement to be a set instead of a tuple? Also can you check if we are testing this raises somewhere already?

@@ -1252,6 +1252,9 @@ def drop_duplicates(self, keep='first', inplace=False):
if self.is_unique:
return self._shallow_copy()

if keep not in ['first', 'last', False]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your comment in the OP but having a hard time understanding here - what's the reason you need to add these checks and cannot include all as part of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is breaking the symmetry between drop_duplicates and duplicated. Since False has the opposite effect in drop_duplicates (drop all dups) and duplicated (keep all dups). So the 'all' keyword has been blocked from the drop_duplicates because it wouldn't do what you expect. Lower down in the hashtable helpers the functions can then keep their symmetry if all is blocked at the top level, which is what this piece of code is supposed to do.

To differentiate, could raise a specific error relating to keep='all'. That probably be clearer as to why the statement was there.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

closing as stale. if you'd like to continue, pls ping.

@jreback jreback closed this Nov 23, 2018
@svenharris
Copy link
Contributor Author

Will come back to this if I (or anyone else) thinks of a good idea for the design but there wasn't consensus in the discussion what the change should look like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Allow use of keep="all" in duplicated
4 participants