Skip to content

Zarr Array append method performance #583

Open
@lucabergamini

Description

@lucabergamini

Minimal, reproducible code sample, a copy-pastable example if possible

May not be necessary, but I'll provide it if this turn out to be really a bug

Problem description

We're working with many (100k) small zarrs (all with the same arrays inside) and we need to merge them into a single big one. The array's append() method seems the right choice here, however:

Performance seems to decrease linearly through time (from 50it/s at the beginning to 17it/s at around 40%)

As an alternative approach, I've tried to:

  • first initialise the final zarr's arrays with the desired shapes (computed by iterating over all zarrs);
  • use indexing assignment instead of concat on the final zarr while iterating over input zarrs

This approach turn out to be faster from the start (70it/s) and doesn't suffer from any significant performance drop).

Is this expected? Assignment requires us to write way more code, so it would be very convenient to have an append function with the same performance

Version and installation information

Please provide the following:

  • zarr 2.4.0
  • numcodecs 0.6.4
  • python 3.6.8
  • Linux
  • pip install

TL;DR append() gets slow while index assign doesn't

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions