Skip to content

Uniform file IO API and consolidated codebase #15008

Open
@dhimmel

Description

@dhimmel

There are at least three things that many of the IO methods must deal with: reading from URL, reading/writing to a compressed format, and different text encodings. It would be great if all io functions where these factors were relevant could use the same code (consolidated codebase) and expose the same options (uniform API).

In #14576, we consolidated the codebase but more consolidation is possible. In io.common.py, there are three functions that must be sequentially called to get a file-like object: get_filepath_or_buffer, _infer_compression, and _get_handle. This should be consolidated into a single function, which can then delegate to sub functions.

Currently, pandas supports the following io methods. First for reading:

  • read_csv
  • read_excel
  • read_hdf
  • read_feather
  • read_sql
  • read_json
  • read_msgpack (experimental)
  • read_html
  • read_gbq (experimental)
  • read_stata
  • read_sas
  • read_clipboard
  • read_pickle

And then for writing:

  • to_csv
  • to_excel
  • to_hdf
  • to_feather
  • to_sql
  • to_json
  • to_msgpack (experimental)
  • to_html
  • to_gbq (experimental)
  • to_stata
  • to_clipboard
  • to_pickle

Some of these should definitely use the consilidated/uniform API, such as read_csv, read_html, read_pickle, read_excel.

Some functions perhaps should be kept separate, such as read_feather or read_clipboard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorEnhancementIO DataIO issues that don't fit into a more specific labelRefactorInternal refactoring of code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions