Storage to S3, LIST operations

I  have a zarr file on S3 where I am storing data on every ten minutes. I'm using zarr version 2.3.1 and s3fs to connect to the AWS bucket. The zarr file has the following structure:

```
/
 └── yyyy
         └── mm
                    └── dd
                             └── HHMM
                                         └── variable
                                                    ├── array (1, 1440, 1440) int32
                                         └── variable
                                                    ├── array (1, 1440, 1440) int32
```

As my zarr file is growing I'm noticing an increase in costs due to LIST operations. When digging into the log files I noticed that the creation of a zarr array `zarr.create()` on S3 involves the listing of all the groups in the zarr file. As a LIST operation on S3 is expensive and the number of requests grows with the growing number of groups we have in the zarr file. Therefore I'm having an unsustainable situation in terms costs related to (unnecessary?) LIST operations. See a screenshot of the logs:

> urllib3.util.retry DEBUG    Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
> urllib3.connectionpool DEBUG    https://amazonaws.com:443 "GET /?list-type=2&prefix=file.zarr%2F2019%2F06%2F12%2F&delimiter=%2F&encoding-type=url HTTP/1.1" 200 None
> urllib3.util.retry DEBUG    Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
> urllib3.connectionpool DEBUG    https://amazonaws.com:443 "GET /?list-type=2&prefix=file.zarr%2F2019%2F06%2F12%2F1010%2F&delimiter=%2F&encoding-type=url HTTP/1.1" 200 None
> urllib3.util.retry DEBUG    Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
> urllib3.connectionpool DEBUG    https://amazonaws.com:443 "GET /?list-type=2&prefix=file.zarr%2F2019%2F06%2F12%2F1010%2Fcth%2F&delimiter=%2F&encoding-type=url HTTP/1.1" 200 None

Is there a work around for this that doesn't require the listing of all the groups when pushing an array to a new group? Or is there another way of saving an array to zarr that doesn't require that the array exists? 

Thanks

Cedric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Storage to S3, LIST operations #460

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Storage to S3, LIST operations #460

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions