Description
Say we have an array with multiple NaN values:
>>> x = xp.full(5, xp.nan)
>>> x
Array([nan, nan, nan, nan, nan])
Should NaNs be treated as unique to one another?
>>> xp.unique(x)
Array([nan, nan, nan, nan, nan])
Or should they be treated as the same?
>>> xp.unique(x)
Array([nan])
I have no dog in the race but you might want to refer to some discussion bottom of numpy/numpy#19301 and in the NumPy mailing list that relates to a recent accidental "regression" in how np.unique()
deals with NaNs.
In either case, just specificity would prevent creating wrapper methods to standardise this behaviour. For example I created this (admittedly scrappy) utility method for HypothesisWorks/hypothesis#3065 to work for both scenarios, which @asmeurer noted was not ideal and probably a failure of the Array API spec.
def count_unique(array):
n_unique = 0
nan_index = xp.isnan(array)
for isnan, count in zip(*xp.unique(nan_index, return_counts=True)):
if isnan:
n_unique += count
break
filtered_array = array[~nan_index]
unique_array = xp.unique(filtered_array)
n_unique += unique_array.size
return n_unique
And if deterministic NaN behaviour cannot be part of the API, a note should be added to say this behaviour is out of scope.