Skip to content

Directories and delimiter handling #562

Open
@hayesgb

Description

@hayesgb

Opened based on #554 .

Azure Storage consists of a 2-level hierarch, where the top level is a container, and all blobs underneath a container have a flat structure. Delimiters ("/") that denote folders in traditional filesystems have no meaning. Azure implements a BlobPrefix object for convenience, but the BlobPrefix can not be created directly, and is immediately removed when all blobs underneath the prefix are deleted.

This creates challenges for filesystem operations like mkdir(), because the making of an empty directory can only be done by creating an empty blob. The result is that:

fs.mkdir("container/blob_folder")
fs.mkdir("container/blob_folder/")

both appear as size=0 blobs, and are unique. Choosing the former convention creates issues as described here with partitioned parquet files, while the latter approach runs counter to the convention of removing a trailing "/" when listing directories and/or storing them in dircache, as evidenced by #554 .

Its my understanding this is an issue with s3fs and gcsfs. Do these filesystems exhibit similar challenges, and if so, how are they handled there? Hoping to align on an approach that provides consistency for users, and with the use of dircache.

Thanks,
Greg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions