Skip to content

ENH:  #55647

Open
Open
ENH: #55647
@jacgoldsm

Description

@jacgoldsm

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could request that Pandas use nullable extension dtypes when reading in data rather than casting columns to float or object in the case of NULLs. This would library authors wrapping Pandas who wish to use nullable extension types to do so without either requiring their users to input dtypes manually or requiring the library to convert the dtypes ex post.

Feature Description

The signature would look something like this:

pandas.read_csv(
  filepath_or_buffer,
  *,
  ...,
  use_nullable_dtypes=False,
)

Semantically, if use_nullable_dtypes==True, whenever a DataFrame column would be of type np.int{n} or np.bool{n} except for the presence of NULLs, the column would instead be inferred as type pd.Int{n}Dtype() or pd.BooleanDtype(). Alternatively, the flag could just convert all instances of np.int{n} or np.bool{n} to the nullable types, regardless of whether they contain NULLs.

Alternative Solutions

As far as I'm aware the only current solutions are

  • Add the dtypes that you want to convert to the dtype argument. This is problematic in the case of libraries wrapping Pandas, and requires the user to know not only the type of all of the columns but also their size if they are to be stored efficiently.
  • Parse all the columns as object and then try to convert them to the appropriate data type one by one. This has obvious efficiency problems.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsIO DataIO issues that don't fit into a more specific labelNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions