Description
Currently, DataFrame.astype(dtype_dict: dict)
requires that the dict
keys are a subset of the DataFrame
's columns. This feels like an unnecessary restriction, in my opinion it would suffice / be more intuitive if it would roughly perform:
for col in df.columns:
if col in dtype:
df[col] = df.col.astype(dtype_dict[col])
The fact that this currently raises an error becomes annoying, for example if one needs to repair data types after they became destroyed by a stacking operation - one needs to slice the dtype_dict
-dictionary by the column keys every time!
Of course, a strong argument can be made that raising an error is a good idea to prevent users from erroneously believing the type-casting was performed in the case when a key was miss-typed.
Describe the solution you'd like
I propose to consider either one of the following changes:
Option 1: If df.columns
is a subset of dtype_dict
, do not raise an error if superfluous keys are present. In this case all columns are identified and there is a negligible chance that there is an error due to a typo.
Option 2: Extent the functionality of the already present errors='ignore'
option to also ignore superfluous keys in the dtype_dict
.