module for metadata-aware IO

metadata-aware IO (I made this term up, please suggest a better name) is the use of our store API to do IO that depends on zarr semantics, like reading / writing array and group metadata, for each zarr version. E.g., in zarr v2, a function that reads array metadata has to make 2 requests, one for `.zarray` and another for `.zattrs`. A function that reads array metadata for zarr v3 has a different implementation -- it makes just 1 request, for a different key (`zarr.json`).

We don't have a single place in our codebase for these operations. In fact, there's some worrying code duplication -- we have a function called [`get_array_metadata`](https://github.com/zarr-developers/zarr-python/blob/5f4aeb457072a503d92e6c63c6b66f920cb91611/src/zarr/core/array.py#L164-L212) defined in `core/array.py` that overlaps with [`_read_metadata_v2`](https://github.com/zarr-developers/zarr-python/blob/5f4aeb457072a503d92e6c63c6b66f920cb91611/src/zarr/core/group.py#L3383) and [`_read_metadata_v3`](https://github.com/zarr-developers/zarr-python/blob/5f4aeb457072a503d92e6c63c6b66f920cb91611/src/zarr/core/group.py#L3367), which are both defined in `core/group.py`. 

I think we should put these routines in one place. Eventually, that module would contain functions for:
- reading array metadata
- reading group metadata
- reading array _or_ group metadata (for zarr v2 this case requires its own implementation for performance reasons)
- checking if an array exists
- checking if a group exists
- writing array metadata
- writing group metadata

None of these functions would return an array or group. They would just return array / group metadata, which could be used to create an array or group as needed. For this reason, I don't think these functions belong in `core/array.py` or `core/group.py`, since those modules are concerned with the `Array` and `Group` classes. The metadata-aware IO layer however cuts across the array / group distinction (e.g. with functions that can return either array or group metadata).

Eventually we may want to formalize the set of all these operations as a protocol.

I'm not sure how chunk IO fits in here.

I reached this conclusion while working on #3012 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

module for metadata-aware IO #3017

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

module for metadata-aware IO #3017

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions