Skip to content

ENH: native support for universal_pathlib (upath) IO #60618

Open
@zkurtz

Description

@zkurtz

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

universal_pathlib makes it quite a lot easier to read and write data frames directly against cloud paths like, say, "s3://test_bucket/example.txt" by absorbing authentication concerns and cloud-specific-implementation issues into to the construction of the path itself. This then allows IO methods to work as close to normally as possible without regard for the nature of the path being used (local vs GCS vs S3 etc.).

So, ideally, this would just work:

import pandas as pd
from upath import UPath

path = UPath("s3://test_bucket/example.txt")
[my data frame].to_parquet(path)

But it does not quite work. However, this thin wrapper does seems to work, simply by detecting whether the input path is a UPath, and (if so) passing along the storage options into the pandas IO calls.

Proposal: Extend the allowable types of paths in pandas dataframe IO methods to include UPath, and automatically detect storage options in that case.

Feature Description

Nothing to add ...

Alternative Solutions

Nothing to add ...

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO DataIO issues that don't fit into a more specific labelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions