Skip to content

ENH: Add duplicate="ignore" to pd.cut #46657

Open
@exandi

Description

@exandi

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/71671922/pandas-qcut-function-duplicates-parameter

Question about pandas

I Am new to this, so I will copy paste from Stackoverflow. I maybe could also provide a pull-request. But maybe I oversaw something?

Maybe I don't get the point? but why isn't in Pandas qcut function accepting "ignore" as argument from duplicates?

So small Datasets with duplicate Values are printing the Error: "Bin edges must be unique" and the advice to use the "drop" option. But if you want to have a fixed Number of bins there is no possibility?

small code example thats not working:

import pandas as pd
import numpy as np

data=pd.Series([1,1,2,3])
pd.qcut(data,10,labels=np.arange(0,10),duplicates="raise")

small code how it works, but don't get the same number of bins:

import pandas as pd
import numpy as np

data=pd.Series([1,1,2,3])
qcut(data,4,labels=np.arange(0,3),duplicates="drop")

What could be a possible solution:

if duplicates == "raise":
   raise ValueError(
       f"Bin edges must be unique: {repr(bins)}.\n"
       f"You can drop duplicate edges by setting the 'duplicates' kwarg"
   )
elif duplicates == "drop":
   bins = unique_bins

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further actioncutcut, qcut

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions