Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could request that Pandas use nullable extension dtypes when reading in data rather than casting columns to float or object in the case of NULLs. This would library authors wrapping Pandas who wish to use nullable extension types to do so without either requiring their users to input dtypes manually or requiring the library to convert the dtypes ex post.
Feature Description
The signature would look something like this:
pandas.read_csv(
filepath_or_buffer,
*,
...,
use_nullable_dtypes=False,
)
Semantically, if use_nullable_dtypes==True
, whenever a DataFrame column would be of type np.int{n}
or np.bool{n}
except for the presence of NULLs, the column would instead be inferred as type pd.Int{n}Dtype()
or pd.BooleanDtype()
. Alternatively, the flag could just convert all instances of np.int{n}
or np.bool{n}
to the nullable types, regardless of whether they contain NULLs.
Alternative Solutions
As far as I'm aware the only current solutions are
- Add the dtypes that you want to convert to the
dtype
argument. This is problematic in the case of libraries wrapping Pandas, and requires the user to know not only the type of all of the columns but also their size if they are to be stored efficiently. - Parse all the columns as
object
and then try to convert them to the appropriate data type one by one. This has obvious efficiency problems.
Additional Context
No response