Description
Opened based on #554 .
Azure Storage consists of a 2-level hierarch, where the top level is a container, and all blobs underneath a container have a flat structure. Delimiters ("/") that denote folders in traditional filesystems have no meaning. Azure implements a BlobPrefix
object for convenience, but the BlobPrefix can not be created directly, and is immediately removed when all blobs underneath the prefix are deleted.
This creates challenges for filesystem operations like mkdir()
, because the making of an empty directory can only be done by creating an empty blob. The result is that:
fs.mkdir("container/blob_folder")
fs.mkdir("container/blob_folder/")
both appear as size=0 blobs, and are unique. Choosing the former convention creates issues as described here with partitioned parquet files, while the latter approach runs counter to the convention of removing a trailing "/" when listing directories and/or storing them in dircache, as evidenced by #554 .
Its my understanding this is an issue with s3fs and gcsfs. Do these filesystems exhibit similar challenges, and if so, how are they handled there? Hoping to align on an approach that provides consistency for users, and with the use of dircache.
Thanks,
Greg