Description
Ive only recently started using zarr but im impressed. well done.
I want to share an experience and a possible enhancement.
In one of my use cases i use vindex heavily across the whole array. I know this is likely a worst use case scenario as zarr is reading many many chunks for a small amount of data in each one.
I was previously using numpy memmap arrays for a similar use and it was much faster so i wondered if i used an uncompressed DirectoryStore if it would read chunks as a memmap. no luck, still reading full chunks. So i had a go at subclassing DirectoryStore to do this.
class MemMapReadStore(zarr.DirectoryStore):
"""Directory store using MemMap for reading chunks
"""
def __getitem__(self, key):
filepath = os.path.join(self.path, key)
if os.path.isfile(filepath):
#are there only 2 types of files? .zarray and the chunks?
if key == '.zarray':
with open(filepath, 'rb') as f:
return f.read()
else:
return np.memmap(filepath,mode='r')
else:
raise KeyError(key)
Its working well for me but I dont really know the inner workings of zarr so who knows what i might have broken and other features it wont play well with. I thought the idea might be a basis for an enhancement though. Worth sharing at least.
Speed up depends on access pattern, compression etc but for the example im testing im seeing 22 times speed up v a compressed zarr array of the same dimensions and chunking.
Its only working for reads as that was all i needed and i see the way you write replaces the whole chunk so memmap writes are not doable.