Skip to content

ENH: Support for loading pickles from OS X/macOS zip files containing extraneous "__MACOSX" folder and ".DS_STORE" file #37098

Closed
@ml-evs

Description

@ml-evs

Is your feature request related to a problem?

I am trying to load some zipped data from my collaborator into pandas, but the zip was created under a version of OS X that adds an extraneous __MACOSX folder in the created zipfile, which causes pandas to error when loading the file.

The zipfile itself can obviously be easily fixed with e.g. zip -d filename.zip __MACOSX/\*, but it may cause a headache for a less experienced user. There is a similary problem involving the hidden file .DS_STORE on macOS (see Wikipedia for more on this), which stores metadata on the user's icon preferences for the contents of the zip...

Describe the solution you'd like

I have made the following one line change locally:

zip_names = zf.namelist()

becomes

zip_names = [_ for _ in zf.namelist if not (_.startswith("__MACOSX/") or _.startswith(".DS_STORE"))]

Unfortunately this approach will not work for read_csv(...) as the compression is handled in the C code.

I can make a PR following the submission of this issue.

API breaking implications

Not that I can foresee, the only code path that this effects is changing from a ValueError to a correctly loaded the zip file.

Describe alternatives you've considered

As mentioned, this could be fixed directly in code before invoking pandas, but I think my suggestion is more convenient and doesn't not require any temporary files or extra disk space.

Additional context

Additional context: a lot of scientific data on e.g. figshare suffers from this issue. There may be other hidden files added by other operating systems that could be treated in a similar way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions