Skip to content

Defaulting to_csv to infer compression #22004

Closed
@dhimmel

Description

@dhimmel

This issue follows up on #17900 by thanks @Dobatymo and @gfyoung with review from @jreback. #17900 added an 'infer' option to compression in _get_handle. The main user-facing benefit here is that df.to_csv will be able to infer compression just like pandas.read_csv. However, unlike read_csv the default value for compression is None rather than 'infer'

Unfortunately, much of the convenience of compression='infer' is lost if you have to explicitly specify it. In summary, I think there is a major convenience to the following command to work and automatically perform gzip compression:

df.to_csv('path.csv.gz')

Compatibility assessment

Defaulting to infer would only affect users who are currently using paths with compression extensions but not actually compressing. That's pretty bad practice IMO. Hence, I'm in favor of breaking backwards compatibility and changing the default for compression to infer. It looks like this would go into the major release 0.24?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions