Skip to content

allow you to raise error on missing zarr chunks with open_dataset/open_zarr #5197

Closed as not planned
@bolliger32

Description

@bolliger32

Is your feature request related to a problem? Please describe.
Currently if a zarr store has a missing chunk, it is treaded as all missing. This is an upstream functionality but one for which there may soon be a kwarg allowing you to instead raise an error in these instances (zarr-developers/zarr-python#489). This is valuable in situations where you would like to distinguish intentional NaN data from I/O errors that caused you to not write some chunks. Here's an example of a problematic case in this situation (courtesy of @delgadom ):

import xarray as xr
import numpy as np
xr.Dataset({'myarr': (('x', 'y'), [[0., np.nan], [2., 3.]]), 'x': [0, 1], 'y': [0, 1]}).chunk({'x': 1, 'y': 1}).to_zarr('myzarr.zarr');
print('\n\ndata read into xarray\n' + '-'*30)
print(xr.open_zarr('myzarr.zarr').compute().myarr)
print('\n\nstructure of zarr store\n' + '-'*30)
! ls -R myzarr.zarr
print('\n\nremove a chunk\n' + '-'*30 + '\nrm myzarr.zarr/myarr/1.0')
! rm myzarr.zarr/myarr/1.0
print('\n\ndata read into xarray\n' + '-'*30)
print(xr.open_zarr('myzarr.zarr').compute().myarr)

This prints:

data read into xarray
------------------------------
<xarray.DataArray 'myarr' (x: 2, y: 2)>
array([[ 0., nan],
       [ 2.,  3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1
structure of zarr store
------------------------------
myzarr.zarr:
myarr  x  y
myzarr.zarr/myarr:
0.0  0.1  1.0  1.1
myzarr.zarr/x:
0
myzarr.zarr/y:
0
remove a chunk
------------------------------
rm myzarr.zarr/myarr/1.0
data read into xarray
------------------------------
<xarray.DataArray 'myarr' (x: 2, y: 2)>
array([[ 0., nan],
       [nan,  3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1

Describe the solution you'd like
I'm not sure where a kwarg to the __init__ method of a zarr Array object would come into play within open_zarr or open_dataset (once zarr-developers/zarr-python#489 is merged), but I figured I'd ask this question to see if anyone could point me in the right direction and to get ready for when that zarr feature exists. Happy to file a PR once I know where I'm looking. Couldn't figure it out with some initial browsing

Metadata

Metadata

Assignees

No one assigned

    Labels

    plan to closeMay be closeable, needs more eyeballstopic-zarrRelated to zarr storage library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions