Skip to content

Specify NaN behaviour in unique() #249

Closed
@honno

Description

@honno

Say we have an array with multiple NaN values:

>>> x = xp.full(5, xp.nan)
>>> x
Array([nan, nan, nan, nan, nan])

Should NaNs be treated as unique to one another?

>>> xp.unique(x)
Array([nan, nan, nan, nan, nan])

Or should they be treated as the same?

>>> xp.unique(x)
Array([nan])

I have no dog in the race but you might want to refer to some discussion bottom of numpy/numpy#19301 and in the NumPy mailing list that relates to a recent accidental "regression" in how np.unique() deals with NaNs.

In either case, just specificity would prevent creating wrapper methods to standardise this behaviour. For example I created this (admittedly scrappy) utility method for HypothesisWorks/hypothesis#3065 to work for both scenarios, which @asmeurer noted was not ideal and probably a failure of the Array API spec.

def count_unique(array):
    n_unique = 0
    nan_index = xp.isnan(array)
    for isnan, count in zip(*xp.unique(nan_index, return_counts=True)):
        if isnan:
            n_unique += count
            break
    filtered_array = array[~nan_index]
    unique_array = xp.unique(filtered_array)
    n_unique += unique_array.size
    return n_unique

And if deterministic NaN behaviour cannot be part of the API, a note should be added to say this behaviour is out of scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API changeChanges to existing functions or objects in the API.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions