Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/71671922/pandas-qcut-function-duplicates-parameter
Question about pandas
I Am new to this, so I will copy paste from Stackoverflow. I maybe could also provide a pull-request. But maybe I oversaw something?
Maybe I don't get the point? but why isn't in Pandas qcut function accepting "ignore" as argument from duplicates?
So small Datasets with duplicate Values are printing the Error: "Bin edges must be unique" and the advice to use the "drop" option. But if you want to have a fixed Number of bins there is no possibility?
small code example thats not working:
import pandas as pd
import numpy as np
data=pd.Series([1,1,2,3])
pd.qcut(data,10,labels=np.arange(0,10),duplicates="raise")
small code how it works, but don't get the same number of bins:
import pandas as pd
import numpy as np
data=pd.Series([1,1,2,3])
qcut(data,4,labels=np.arange(0,3),duplicates="drop")
What could be a possible solution:
- Insert a third option "ignore" to
pandas/pandas/core/reshape/tile.py
Line 405 in 06d2301
- Change the if else block in
pandas/pandas/core/reshape/tile.py
Lines 418 to 424 in 06d2301
if duplicates == "raise":
raise ValueError(
f"Bin edges must be unique: {repr(bins)}.\n"
f"You can drop duplicate edges by setting the 'duplicates' kwarg"
)
elif duplicates == "drop":
bins = unique_bins