Skip to content

Update index parameter in pandas to_parquet #156

Closed
@datapythonista

Description

@datapythonista

In the documentation of to_parquet (https://dev.pandas.io/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet), for the index parameter, it says that when the value is None, the behavior depends on the engine.

I did a test, and I'd say that the index is kept with both engines, pyarrow and fastparquet. I guess that was a past behavior, and it wasn't updated. I'd say that the best pandas can do now is to imply have index=True as default.

I think a PR should be simple enough to propose the change, and have the discussion directly in the PR (as opposed to open an issue to discuss). The final solution can end up being a different one, but starting with a proposal can make the discussions easier and more focused.

In the description of the PR, would be useful to have a very simple example that shows how the index is saved in both cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions