Description
Is your feature request related to a problem?
I am trying to load some zipped data from my collaborator into pandas, but the zip was created under a version of OS X that adds an extraneous __MACOSX
folder in the created zipfile, which causes pandas to error when loading the file.
The zipfile itself can obviously be easily fixed with e.g. zip -d filename.zip __MACOSX/\*
, but it may cause a headache for a less experienced user. There is a similary problem involving the hidden file .DS_STORE
on macOS (see Wikipedia for more on this), which stores metadata on the user's icon preferences for the contents of the zip...
Describe the solution you'd like
I have made the following one line change locally:
Line 567 in 9cb3723
becomes
zip_names = [_ for _ in zf.namelist if not (_.startswith("__MACOSX/") or _.startswith(".DS_STORE"))]
Unfortunately this approach will not work for read_csv(...)
as the compression is handled in the C code.
I can make a PR following the submission of this issue.
API breaking implications
Not that I can foresee, the only code path that this effects is changing from a ValueError
to a correctly loaded the zip file.
Describe alternatives you've considered
As mentioned, this could be fixed directly in code before invoking pandas, but I think my suggestion is more convenient and doesn't not require any temporary files or extra disk space.
Additional context
Additional context: a lot of scientific data on e.g. figshare suffers from this issue. There may be other hidden files added by other operating systems that could be treated in a similar way.