Description
We discussed memory usage on Friday's community call. https://github.com/TomAugspurger/zarr-python-memory-benchmark started to look at some stuff.
https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-uncompressed.html has the memray flamegraph for reading an uncompressed array (400 MB total, split into 10 chunks of 40 MB each). I think the optimal memory usage here is about 400 MB. Our peak memory is about 2x that.
https://rawcdn.githack.com/TomAugspurger/zarr-python-memory-benchmark/refs/heads/main/reports/memray-flamegraph-read-compressed.html has the zstd compressed version. Peak memory is about 1.1 GiB.
I haven't looked too closely at the code, but I wonder if we could be smarter about a few things in certain cases:
- For the uncompressed case, we might be able to do a
readinto
directly into (an appropriate slice of)theout
array. We might need to expand the Store API to add some kind ofreadinto
, where the user provides the buffer to read into rather than the store allocating new memory. - For the compressed case, we might be able to improve things once we know the size of the output buffers. I see that numcodec's
zstd.decode
takes an output buffer here that we could maybe use. And past that point, maybe all the codecs could reuse one or two buffers, rather than allocating a new buffer for each stage of the codec (one buffer if doing stuff inplace, two buffers if something can't be done inplace)?
I'm not too familiar with the codec pipeline stuff, but will look into this as I have time. Others should feel free to take this if someone gets an itch though. There's some work to be done :)