Copy-view behaviour and mutating arrays

Context:

- in https://github.com/data-apis/array-api/pull/20#issuecomment-679497162 we discussed the differences between libraries in returning a copy vs. a view from function calls.
- in https://github.com/data-apis/array-api/issues/8 we were discussing how to deal with mutability

That issue and PR were about unrelated topics, so I'll try to summarize the copy-view and mutation topic here and we can continue the discussion.

Note that the two topics are fairly coupled, because copy/view differences only matter (for semantics, not for performance) when mixed with mutation.

### Mutating arrays

There's a number of things that may rely on mutation:

- In-place operators like `+=`, `*=`
- The `out=` keyword argument
- Element and slice assignment with `__setitem__`

Summary of the issue with mutation by @shoyer was: Mutation can be challenging to support in some execution models (at least without another layer of indirection), which is why several projects currently don't support it (TensorFlow and JAX) or only support it half-heartedly (e.g., Dask). The commonality between these libraries is that they build up abstract computations, which is then transformed (e.g., for autodiff) and/or executed in parallel. Even NumPy has "read only" arrays. I'm particularly concerned about new projects that implement this API, which might find the need to support mutation burdensome.

@alextp said: TensorFlow was planning to add mutability and didn't see a real issue with supporting `out=`.

@shoyer said: It's definitely always possible to support mutation at the Python level via some sort of wrapper layer.

`dask.array` is perhaps a good example of this. It supports mutating operations and out in some cases, but its support for mutation is still rather limited. For example, it doesn't support assignment like `x[:2, :] = some_other_array`.

Working around limitations of no support for mutation can usually be done by one of:

1. Use `where` for selection, e.g., `where(arange(4) == 2, 1, 0)`
2. Calculate the "inverse" of the assignment operator in terms of indexing, e.g., `y = array([0, 1]); x = y[[0, 0, 1, 0]]` in this case

Some version of (2) always works, though it can be tricky to work out (especially with current APIs). The duality between indexing and assignment is the difference between specifying where elements come from or where they end up.

The JAX syntax for slice assignment is: `x.at[idx].set(y) vs x[idx] = y`

One advantage of the non-mutating version is that JAX can have reliable assigning arithmetic on array slices with `x.at[idx].add(y)` (`x[idx] += y` doesn't work if `x[idx]` returns a copy).

A disadvantage is that doing this sort thing inside a loop is almost always a bad idea unless you have a JIT compiler, because every indexing assignment operation makes a full copy. So the naive translation of an efficient Python loop that fills out an array row by row would now make a copy in each step. Instead, you'd have to rewrite that loop to use something like concatenate instead (which in my experience is already about as efficient as using indexing assignment).


### Copy-view behaviour

Libraries like NumPy and PyTorch return views where possible from function calls. It's sometimes hard to predict when a view will be returned vs. when a copy - it not only depends on the function in question, but also on whether the input array is contiguous, and sometimes even on input dtype.

This is one place where it's hard to avoid implementation choices leaking into the API:

- Static graph based implementations like TensorFlow and MXNet, or a functional implementation like JAX with immutable arrays, will return a _copy_ for a function like `transpose()`.
- Implementations which support strides and/or use a dynamic graph are able to, and therefore often will, return a _view_ when they can (which is the case for `transpose()`).

The above copy vs. view difference starts leaking into the API - i.e., the same code starts giving different results for different implementations - when it is combined with an operation that performs in-place mutation of an array (either the base array or the view on it). In the absence of that combination, views are simply a performance optimization that's invisible to the user.

The question is whether copy-view differences should be allowed, and if so how to deal with the semantics that vary between libraries.

To answer whether is should be allowed, let's first ask how often the combination of views and mutation is used. A few observations:

1. It is normally considered a bug if a library function (e.g. a SciPy or scikit-learn one) mutates any of its input arguments - unless the function is explicitly documented as doing so, which is rare. So the main concern is use inside functions, with arrays that are either created inside the function or use a copy of the input array.
2. A search for patterns like `*=`, `+=` and `] =` in SciPy and scikit-learn `.py` files shows that in-place mutation inside functions is heavily used.
3. There's a significant difference between mutating a complete array (e.g. with `+= 1`) and mutating part of an array (e.g. with `x[:, :2] = y`). The former is a lot easier to support for array libraries employing static graphs or a JIT than the latter. See the discussion at https://github.com/data-apis/array-api/issues/8#issuecomment-673202302 for details.
4. It's harder to figure out how often the combination of mutating part of an array _and_ that mutation affecting a view occurs. This could be tested though, with a patched NumPy to raise an exception on mutations affecting a view and then running test suites of downstream libraries.

### Options for how to standardize

In https://github.com/data-apis/array-api/issues/8 @shoyer listed the following options for how to deal with mutability:

1. Require support for in-place operations. Libraries that don't support mutation fully will need to write a wrapper layer, even if it would be inefficient.
2. Make support for in-place operations optional. Arrays can indicate whether they support mutation via some standard API, e.g., like NumPy's `ndarray.flags.writeable`. (From later discussion, see https://github.com/data-apis/array-api/issues/8#issuecomment-674514340 for the implication of that for users of the API).
3. Don't include support for in-place operations in the spec. This is a conservative choice, one which _might_ have negative performance consequences (but it's a little hard to say without looking carefully). At the very least, it might require a library like SciPy to retain a special path for numpy.ndarray objects.

To that I'd like to add a more granular option:

4. Require support for in-place operations that are unambiguous, and require raising an exception in case a view is mutated.

   Rationale:
   
    (a) This would require libraries that don't support mutation to write a wrapper layer, but the behaviour would be unambiguous and in most cases the wrapper would not be inefficient.
    (b) In case inefficient mutation is detected (e.g. mutation a large array row-by-row in a loop), a warning may be emitted.

A variant of this option would be:

5. Require support for in-place operations that are unambiguous _and mutate the whole array at once_ (i.e. `+=` and `out=` must be supported, element/slice assignment must raise an exception), and require raising an exception in case a view is mutated.

   Trade-off here is ease of implementation for libraries like Dask and JAX vs. putting a rewrite burden on SciPy et al. and a usability burden on end users (the alternative to element/slice assignment is unintuitive).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Copy-view behaviour and mutating arrays #24

Mutating arrays

Copy-view behaviour

Options for how to standardize

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Copy-view behaviour and mutating arrays #24

Description

Mutating arrays

Copy-view behaviour

Options for how to standardize

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions