Optimisation for the __contains__ method of storage.LRUStoreCache

I have a few TB large dataset with 11 parameters and about 100000 chunks, and am storing it in azure blob using the `ABSStore` mutable mapping. When I do `zarr.open_group(store=store, mode='r')` with store as `zarr.LRUStoreCache(max_size=2**33, store=zarr.storage.ABSStore('testcontainer', 'mydataset', BLOB_ACCOUNT_NAME, BLOB_ACCOUNT_KEY))`, it takes about 45 seconds to open the group. Without the `LRU` wrapper the `open_group` operation is instantaneous. I traced the problem to the `__contains__` method in `LRUStoreCache` mutable mapping wrapper(`open_group` calls the `contains_array` method). The `__contains__` method[(here)](https://github.com/zarr-developers/zarr/blob/master/zarr/storage.py#L1771) in `LRUStoreCache`  is implemented by listing all the keys in the mutable mapping of the underlying store, and therefore, all 100000 chunks are listed before checking for existence. In the context of cloud storage this can cause significant overhead.

This is the method of `LRUStoreCache` now:
```python
    def __contains__(self, key):
        with self._mutex:
            if self._contains_cache is None:
                self._contains_cache = set(self._keys())
            return key in self._contains_cache
```

when I changed it to this:
```python
    def __contains__(self, key):
        return key in self._store
```
the `open_group` operation is almost instantaneous as the `__contains__` method of the underlying `ABSStore` class uses the `exists` option on azure blob and so doesn't have to list all keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimisation for the contains method of storage.LRUStoreCache #295

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Optimisation for the __contains__ method of storage.LRUStoreCache #295

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimisation for the contains method of storage.LRUStoreCache #295