Skip to content

memmap reads from directory store #265

Closed
@artttt

Description

@artttt

Ive only recently started using zarr but im impressed. well done.

I want to share an experience and a possible enhancement.
In one of my use cases i use vindex heavily across the whole array. I know this is likely a worst use case scenario as zarr is reading many many chunks for a small amount of data in each one.
I was previously using numpy memmap arrays for a similar use and it was much faster so i wondered if i used an uncompressed DirectoryStore if it would read chunks as a memmap. no luck, still reading full chunks. So i had a go at subclassing DirectoryStore to do this.


class MemMapReadStore(zarr.DirectoryStore):
    """Directory store using MemMap for reading chunks
    """
    def __getitem__(self, key):
        filepath = os.path.join(self.path, key)
        if os.path.isfile(filepath):
            #are there only 2 types of files? .zarray and the chunks?
            if key == '.zarray':
                with open(filepath, 'rb') as f:
                    return f.read()
            else:
                return np.memmap(filepath,mode='r')
        else:
            raise KeyError(key)

Its working well for me but I dont really know the inner workings of zarr so who knows what i might have broken and other features it wont play well with. I thought the idea might be a basis for an enhancement though. Worth sharing at least.

Speed up depends on access pattern, compression etc but for the example im testing im seeing 22 times speed up v a compressed zarr array of the same dimensions and chunking.

Its only working for reads as that was all i needed and i see the way you write replaces the whole chunk so memmap writes are not doable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions