Skip to content

Listing Index in Tarfiles #1808

Open
Open
@pgierz

Description

@pgierz

Hi there,

I'm trying to write some examples for our users using fsspec so they can incorporate it into their own scripts. Only some of our users are proficient in Python, so these examples need to be super verbose and simple. Our use case is climate model simulation and archiving on a tape system. Longer-term goal is to integrate it into some form of cataloguing, but, that's for later.

I'm trying to get a index list of a tar file and get that to be recycled upon re-use, but that doesn't seem to be working?

def simple_targz_example():
    base_path = "/albedo/work/user/pgierz/SciComp/Tutorials/AWIESM_Basics/experiments/basic-001.tar.gz"
    index_path = "basic-001.tar.gz.index"  # Cache information about the contents to disk at this location
    # fo is the file_object (this is the tar file to look inside of)
    basic_001_fs = fsspec.filesystem(
        "tar",
        fo=base_path,
        index_store=index_path,
    )
    # Find all NetCDF Files in "outdata".
    # NOTE(PG): Since we have a tar filesystem with the ``fo`` argument, we DO NOT need
    #           to include the base_path on the filesystem here. However, the tar file 
    #           contains the base folder, so you need to include that:
    outdata_netcdf_files = basic_001_fs.glob("basic-001/outdata/**/*.nc")
    for nc_file in outdata_netcdf_files:
        print(nc_file)

I saw in the code here that this seems to not be implemented yet? I had a try in #1807

def _index(self):
# TODO: load and set saved index, if exists
out = {}
for ti in self.tar:
info = ti.get_info()
info["type"] = typemap.get(info["type"], "file")
name = ti.get_info()["name"].rstrip("/")
out[name] = (info, ti.offset_data)
self.index = out
# TODO: save index to self.index_store here, if set

I might need to ask some additional questions, I hope that I don't spam too much with silly things, sorry for that.

Cheers,
Paul

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions