Description
metadata-aware IO (I made this term up, please suggest a better name) is the use of our store API to do IO that depends on zarr semantics, like reading / writing array and group metadata, for each zarr version. E.g., in zarr v2, a function that reads array metadata has to make 2 requests, one for .zarray
and another for .zattrs
. A function that reads array metadata for zarr v3 has a different implementation -- it makes just 1 request, for a different key (zarr.json
).
We don't have a single place in our codebase for these operations. In fact, there's some worrying code duplication -- we have a function called get_array_metadata
defined in core/array.py
that overlaps with _read_metadata_v2
and _read_metadata_v3
, which are both defined in core/group.py
.
I think we should put these routines in one place. Eventually, that module would contain functions for:
- reading array metadata
- reading group metadata
- reading array or group metadata (for zarr v2 this case requires its own implementation for performance reasons)
- checking if an array exists
- checking if a group exists
- writing array metadata
- writing group metadata
None of these functions would return an array or group. They would just return array / group metadata, which could be used to create an array or group as needed. For this reason, I don't think these functions belong in core/array.py
or core/group.py
, since those modules are concerned with the Array
and Group
classes. The metadata-aware IO layer however cuts across the array / group distinction (e.g. with functions that can return either array or group metadata).
Eventually we may want to formalize the set of all these operations as a protocol.
I'm not sure how chunk IO fits in here.
I reached this conclusion while working on #3012