Skip to content

DIS: Keywords for multi-threading capabilities #43313

Open
@lithomas1

Description

@lithomas1

With the addition of the new pyarrow engine, we now have the option to use multiple threads to read a CSV file. (This is also controllable through the pyarrow.set_cpu_count option).

Should we expose a keyword(such as num_threads maybe) to the user as a keyword, or just add an example in the docs(for this case, redirecting to pyarrow.set_cpu_count? In the case of read_csv, this keyword would probably only apply to the pyarrow engines, however it is worth noting that we have had multiple feature requests for parallel CSV reading (e.g. #37955), and it is probably worth it to be configure the number of threads used if we offer multithreading.

Personally, I would prefer having a keyword, as if we decide to add more I/O engines with multithreading capabilities, it would be more convenient to be able to control this option through a keyword.

cc @pandas-dev/pandas-core

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignIO DataIO issues that don't fit into a more specific labelMultithreadingParallelism in pandasNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions