Skip to content

ENH: Update Series/DataFrame Constructor args to Enable dtype Forced Conversion on Creation #44117

Open
@adamgranthendry

Description

@adamgranthendry

Users often know the data types they want to convert their columns to at creation.

  1. Can the pd.Series constructor be given an additional argument errors (default raise) to optionally convert all values to a specified type (pd.NA or np.nan if cannot cast) at creation?

e.g. Currently, the following raises a ValueError:

>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int)
ValueError: invalid literal for int() with base 10: 'a'

However, it would be nice to have the following capability:

>>> a = pd.Series(['a', 1, 3, ''], dtype=np.int, errors="coerce")
>>> a
0    NaN
1    1
2    3
3    NaN
dtype: int32
  1. Extending to the pd.DataFrame constructor, can the dtype argument be altered to something like Union[pd._typing.Dtype, Dict[str, pd._typing.Dtype] | None such that the user could pass in a dictionaries of columns as strings with dtype values they want to convert them to?
    (Again, the an argument errors (default raise) should be added so ValueErrors are still raised unless the user explicitly sets errors=coerce)

Currently, it is common that users must get each DataFrame column to convert manually after it is created, convert its dtype with a to_..() method using errors=coerce, and reassign back to the DataFrame column (since the to_...() methods have no inplace argument).

This feature would combine the functionality common to the other to_...() methods in one place. The method read_excel, for example, has an argument converters, which has the desired behavior sought after in this feature request when creating pd.DataFrame objects.

(ASIDE: It should be noted DataFrame.convert_dtypes doesn't have coerce functionality, so the Series a in the above example would simply be converted to type object)

Metadata

Metadata

Assignees

No one assigned

    Labels

    AstypeConstructorsSeries/DataFrame/Index/pd.array Constructors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions