Skip to content

performance optimization in delitems #1336

Open
@d-v-b

Description

@d-v-b

in FSStore.delitems, only keys that exist in the store are deleted. This requires checking each key in a loop, which incurs a lot of overhead for filesystems like s3. I suspect that sequentially calling FSStore.__contains__ here erodes a lot of the benefit of using a bulk delete.

From the zarr side, the solution would be to simply request deletion of keys regardless of whether they exist in the store or not, however fsspec raises an exception if you try to rm a file that doesn't exist. So perhaps we need to add an exception-handling kwarg to rm like the on-omit kwarg for FSMap.getitems. This would be more "s3ish" anyway, since the boto3 delete_objects method silently treats missing keys as deleted.

thoughts on this @martindurant ? I'm happy to open a PR on the FSSpec side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePotential issues with Zarr performance (I/O, memory, etc.)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions