Skip to content

ENH: add gzip/bz2 compression to read_pickle() (and perhaps other read_*() methods) #11666

Closed
@gfairchild

Description

@gfairchild

Right now, read_csv() has a compression option, which allows the user to pass in a gzipped or bz2-compressed CSV file directly into Pandas to be read. It would be great if read_pickle() supported the same option. Pickles actually compress surprisingly well; I have a 567M Pandas pickle (resulting from DataFrame.to_pickle()) that packs down to 45M with pigz --best. An order of magnitude difference in size is pretty significant. This makes storing static pickles long-term as gzipped archives a very attractive option. Workflow would be made easier if Pandas could natively handle my dataframe.pickle.gz files in the same way it does compressed CSV files.

More generally, a compression option should probably be allowed for most read_* methods. Many of the read_* methods involve formats that compress very well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functionsEnhancementIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions