Description
Minimal, reproducible code sample, a copy-pastable example if possible
May not be necessary, but I'll provide it if this turn out to be really a bug
Problem description
We're working with many (100k) small zarrs (all with the same arrays inside) and we need to merge them into a single big one. The array's append()
method seems the right choice here, however:
Performance seems to decrease linearly through time (from 50it/s at the beginning to 17it/s at around 40%)
As an alternative approach, I've tried to:
- first initialise the final zarr's arrays with the desired shapes (computed by iterating over all zarrs);
- use indexing assignment instead of concat on the final zarr while iterating over input zarrs
This approach turn out to be faster from the start (70it/s) and doesn't suffer from any significant performance drop).
Is this expected? Assignment requires us to write way more code, so it would be very convenient to have an append function with the same performance
Version and installation information
Please provide the following:
- zarr 2.4.0
- numcodecs 0.6.4
- python 3.6.8
- Linux
- pip install
TL;DR append()
gets slow while index assign doesn't