Address polymorphic return value for `unique`?

`unique` was added very early on in gh-25. The review discussions have focused on

- whether return values should be sorted or not (see gh-40), and
- (b) the issue with value-dependent output shape (see gh-84).

There is another issue that was not discussed: `unique` is the only function whose return type is polymorphic. It can return an array, or a length-2, 3 or 4 tuple of arrays:

```python
unique(x, /, *, return_counts=False, return_index=False, return_inverse=False)
```
API doc: https://data-apis.org/array-api/latest/API_specification/set_functions.html#unique-x-return-counts-false-return-index-false-return-inverse-false

This issue is painful for, e.g., libraries that have to support it in their JIT compiler. For `svd`, which previously was polymorphic too, we decided that it was better to introduce `svdvals` to get rid of the polymorphism. There's even a design principle that says as much in https://data-apis.org/array-api/latest/extensions/linear_algebra_functions.html: _"In general, interfaces should avoid polymorphic return values (e.g., returning an array or a namedtuple, dependent on, e.g., an optional keyword argument)."_

Given that there are 3 boolean keywords, it's not feasible to split `unique` into separate functions (there'd be 8 of them). An alternative may be:
- return a custom object with four named fields: (`unique`, `indices`, `inverse`, `counts`).
- no tuple unpacking
- if data is returned, its type is array
- for fields where the boolean keyword was `False`, accessing the field is undefined behavior


Another question that may be relevant is: how often are the boolean keywords each used? The data at https://raw.githubusercontent.com/data-apis/python-record-api/master/data/typing/numpy.py may help there:

<details>

```python
@overload
def unique(ar: numpy.ndarray, return_inverse: bool):
    """
    usage.scipy: 14
    usage.skimage: 5
    usage.sklearn: 125
    usage.statsmodels: 18
    usage.xarray: 3
    """
    ...


@overload
def unique(ar: numpy.ndarray):
    """
    usage.dask: 7
    usage.matplotlib: 6
    usage.orange3: 16
    usage.scipy: 39
    usage.seaborn: 5
    usage.skimage: 48
    usage.sklearn: 385
    usage.statsmodels: 45
    usage.xarray: 10
    """
    ...


@overload
def unique(ar: numpy.ndarray, return_inverse: bool, return_counts: bool):
    """
    usage.skimage: 3
    usage.sklearn: 2
    usage.statsmodels: 1
    """
    ...


@overload
def unique(ar: numpy.ndarray, return_counts: bool):
    """
    usage.orange3: 3
    usage.scipy: 5
    usage.skimage: 8
    usage.sklearn: 1
    usage.statsmodels: 1
    """
    ...


@overload
def unique(ar: numpy.ndarray, return_index: bool):
    """
    usage.scipy: 5
    usage.skimage: 4
    usage.sklearn: 1
    usage.statsmodels: 1
    """
    ...


@overload
def unique(ar: List[int], return_counts: bool):
    """
    usage.orange3: 1
    """
    ...


@overload
def unique(ar: xarray.core.dataarray.DataArray):
    """
    usage.xarray: 2
    """
    ...


@overload
def unique(ar: List[numpy.int32]):
    """
    usage.statsmodels: 1
    """
    ...


@overload
def unique(ar: List[Literal["b", "a"]]):
    """
    usage.sklearn: 2
    usage.statsmodels: 1
    """
    ...


@overload
def unique(ar: numpy.ndarray, return_index: bool, return_inverse: bool):
    """
    usage.matplotlib: 1
    usage.sklearn: 6
    usage.statsmodels: 3
    """
    ...


@overload
def unique(ar: List[Literal["4"]], return_inverse: bool):
    """
    usage.statsmodels: 1
    """
    ...


@overload
def unique(
    ar: Union[
        pandas.core.series.Series,
        numpy.ndarray,
        pandas.core.arrays.categorical.Categorical,
        List[str],
    ],
    return_inverse: bool = ...,
    return_index: bool = ...,
):
    """
    usage.pandas: 22
    """
    ...


@overload
def unique(ar: List[float]):
    """
    usage.scipy: 2
    usage.sklearn: 3
    """
    ...


@overload
def unique(ar: numpy.ndarray, axis: int):
    """
    usage.scipy: 1
    """
    ...


@overload
def unique(ar: pandas.core.series.Series):
    """
    usage.dask: 3
    usage.geopandas: 1
    usage.prophet: 4
    usage.seaborn: 4
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: numpy.ma.core.MaskedArray):
    """
    usage.matplotlib: 1
    """
    ...


@overload
def unique(ar: List[int]):
    """
    usage.seaborn: 2
    usage.sklearn: 26
    """
    ...


@overload
def unique(ar: List[numpy.int64]):
    """
    usage.dask: 4
    """
    ...


@overload
def unique(ar: List[numpy.float64]):
    """
    usage.dask: 5
    usage.sklearn: 2
    """
    ...


@overload
def unique(ar: List[numpy.complex128]):
    """
    usage.dask: 3
    """
    ...


@overload
def unique(ar: List[numpy.bool_]):
    """
    usage.dask: 3
    """
    ...


@overload
def unique(ar: List[numpy.float32]):
    """
    usage.dask: 2
    """
    ...


@overload
def unique(
    ar: numpy.ndarray, return_index: bool, return_inverse: bool, return_counts: bool
):
    """
    usage.dask: 2
    """
    ...


@overload
def unique(ar: numpy.memmap):
    """
    usage.sklearn: 2
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "beer", "pizza", "the"]]):
    """
    usage.sklearn: 2
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "beer", "burger", "pizza", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "beer", "burger", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "coke", "burger", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["burger", "coke", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "celeri", "salad", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "water", "sparkling", "salad", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "celeri", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["water", "salad", "tomato", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["copyright", "water", "salad", "tomato", "the"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["three", "two", "one"]]):
    """
    usage.sklearn: 3
    """
    ...


@overload
def unique(ar: List[Union[float, int]], return_inverse: bool):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["b", "c", "a"]]):
    """
    usage.sklearn: 1
    """
    ...


@overload
def unique(ar: List[Literal["3", "2", "1"]]):
    """
    usage.sklearn: 2
    """
    ...


def unique(
    ar: object,
    return_index: bool = ...,
    return_inverse: bool = ...,
    return_counts: bool = ...,
):
    """
    usage.dask: 29
    usage.geopandas: 1
    usage.matplotlib: 8
    usage.orange3: 20
    usage.pandas: 22
    usage.prophet: 4
    usage.scipy: 66
    usage.seaborn: 11
    usage.skimage: 68
    usage.sklearn: 574
    usage.statsmodels: 72
    usage.xarray: 15
```

</details>

Conclusion: a large majority of usage is `unique(x)`, without any keywords. `return_inverse` is fairly often used, `return_index` almost never (checked by searching Pandas, sklearn et al.).

So another alternative count be to include two separate functions:
1. `unique(x, /)` - returns array of unique values
2. `unique_all(x, /)` - returns tuple of 4 arrays (may need a better name)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Address polymorphic return value for `unique`? #212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Address polymorphic return value for unique? #212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Address polymorphic return value for `unique`? #212