Description
Users often know the data types they want to convert their columns to at creation.
- Can the
pd.Series
constructor be given an additional argumenterrors
(defaultraise
) to optionally convert all values to a specified type (pd.NA
ornp.nan
if cannot cast) at creation?
e.g. Currently, the following raises a ValueError
:
>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int)
ValueError: invalid literal for int() with base 10: 'a'
However, it would be nice to have the following capability:
>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int, errors="coerce")
>>> a
0 NaN
1 1
2 3
3 NaN
dtype: int32
- Extending to the
pd.DataFrame
constructor, can thedtype
argument be altered to something likeUnion[pd._typing.Dtype, Dict[str, pd._typing.Dtype] | None
such that the user could pass in a dictionaries of columns as strings withdtype
values they want to convert them to?
(Again, the an argumenterrors
(defaultraise
) should be added soValueError
s are still raised unless the user explicitly setserrors=coerce
)
Currently, it is common that users must get each DataFrame
column to convert manually after it is created, convert its dtype
with a to_..()
method using errors=coerce
, and reassign back to the DataFrame
column (since the to_...()
methods have no inplace
argument).
This feature would combine the functionality common to the other to_...()
methods in one place. The method read_excel
, for example, has an argument converters
, which has the desired behavior sought after in this feature request when creating pd.DataFrame
objects.
(ASIDE: It should be noted DataFrame.convert_dtypes
doesn't have coerce
functionality, so the Series
a
in the above example would simply be converted to type object
)