Skip to content

module for metadata-aware IO #3017

Open
@d-v-b

Description

@d-v-b

metadata-aware IO (I made this term up, please suggest a better name) is the use of our store API to do IO that depends on zarr semantics, like reading / writing array and group metadata, for each zarr version. E.g., in zarr v2, a function that reads array metadata has to make 2 requests, one for .zarray and another for .zattrs. A function that reads array metadata for zarr v3 has a different implementation -- it makes just 1 request, for a different key (zarr.json).

We don't have a single place in our codebase for these operations. In fact, there's some worrying code duplication -- we have a function called get_array_metadata defined in core/array.py that overlaps with _read_metadata_v2 and _read_metadata_v3, which are both defined in core/group.py.

I think we should put these routines in one place. Eventually, that module would contain functions for:

  • reading array metadata
  • reading group metadata
  • reading array or group metadata (for zarr v2 this case requires its own implementation for performance reasons)
  • checking if an array exists
  • checking if a group exists
  • writing array metadata
  • writing group metadata

None of these functions would return an array or group. They would just return array / group metadata, which could be used to create an array or group as needed. For this reason, I don't think these functions belong in core/array.py or core/group.py, since those modules are concerned with the Array and Group classes. The metadata-aware IO layer however cuts across the array / group distinction (e.g. with functions that can return either array or group metadata).

Eventually we may want to formalize the set of all these operations as a protocol.

I'm not sure how chunk IO fits in here.

I reached this conclusion while working on #3012

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions