Skip to content

Layered data caching #382

Open
Open
@jakirkham

Description

@jakirkham

Have generally been musing about how to cutdown latency when accessing data from disk with Zarr (and this discussion certainly reflects that). It seems that there are a few levels in the process where this may be appropriate.

  1. Caching fully decoded results ( LRU cache for decoded chunks  #306 )
  2. Caching compressed data ( LRU store cache #223 )
  3. Leveraging memory-mapping ( memmap reads from directory store #265 ) ( RFC: Optionally support memory-mapping DirectoryStore values #377 )
  4. Caching file-like objects ( File object cache for DirectoryStore #381 )

IMHO we really want all of these. We want to cache fully decoded results as it cuts down on reads and decompression time, which are not necessarily cheap. Though it does mean more memory is used. So we want the option to cache compressed, encoded data as we can fit more of it in memory in exchange for some CPU time and thus avoid expensive reads. However cache misses for both of these is still expensive and we can potentially wind up paying for the same cached compressed data more than once across processes. Thus we want the OS to load data in memory for us beforehand and we want that data to be easily accessible across processes. Though we really can't benefit much from this without ensuring file-like objects are kept around for more than one read operation. Therefore we want to have a cache of file-like objects to read from as needed, but we want to be smart about this and cap them out at some reasonable user defined limit. By doing all of this we effectively protect are self from downgrading all the way to a read until it is absolutely necessary. Even when that read occurs we try to minimize the cost as much as possible by potentially even having the OS prep that read beforehand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions