ENH: Add `engine` keyword to `read_json` to enable reading from pyarrow

pyarrow has a `read_json` function that could be used as an alternative parser for `pd.read_json` https://arrow.apache.org/docs/python/generated/pyarrow.json.read_json.html#pyarrow.json.read_json

Like we have for `read_csv` and `read_parquet`, I would like to propose a `engine` keyword argument to allow users to pick the parsing backend `engine="ujson"|"pyarrow"`

One change compared to `read_csv` and `read_parquet` with `engine="pyarrow"` I would like to propose would be to return `ArrowExtensionArray`s instead of converting the result to numpy dtypes such that the `pyarrow.Table` returned by `read_json` still propagates pyarrow objects underneath.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add `engine` keyword to `read_json` to enable reading from pyarrow #48893

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: Add engine keyword to read_json to enable reading from pyarrow #48893

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: Add `engine` keyword to `read_json` to enable reading from pyarrow #48893