Skip to content

ENH: DataFrame.astype(dtype: dict) should work in the presence of superfluous keys. #43837

Open
@randolf-scholz

Description

@randolf-scholz

Currently, DataFrame.astype(dtype_dict: dict) requires that the dict keys are a subset of the DataFrame's columns. This feels like an unnecessary restriction, in my opinion it would suffice / be more intuitive if it would roughly perform:

for col in df.columns:
   if col in dtype:
      df[col] = df.col.astype(dtype_dict[col])

The fact that this currently raises an error becomes annoying, for example if one needs to repair data types after they became destroyed by a stacking operation - one needs to slice the dtype_dict-dictionary by the column keys every time!

Of course, a strong argument can be made that raising an error is a good idea to prevent users from erroneously believing the type-casting was performed in the case when a key was miss-typed.

Describe the solution you'd like

I propose to consider either one of the following changes:

Option 1: If df.columns is a subset of dtype_dict, do not raise an error if superfluous keys are present. In this case all columns are identified and there is a negligible chance that there is an error due to a typo.
Option 2: Extent the functionality of the already present errors='ignore' option to also ignore superfluous keys in the dtype_dict.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions