Open
Description
pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.
For a setting operation this is convenient & I think expected as a user
In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]})
In [36]: df
Out[36]:
A B C D
0 NaT NaN foo 1
In [37]: df.dtypes
Out[37]:
A datetime64[ns]
B object
C object
D int64
dtype: object
In [38]: df.loc[0,'D'] = 1.0
In [39]: df.dtypes
Out[39]:
A datetime64[ns]
B object
C object
D float64
dtype: object
However for a .fillna
(or .replace
) operation this might be a bit unexpected. So A
was coerced to object
dtype, even though it was datetime64[ns]
.
In [40]: df.fillna('')
Out[40]:
A B C D
0 foo 1
In [41]: df.fillna('').dtypes
Out[41]:
A object
B object
C object
D float64
dtype: object
So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'
. This last behavior would be equiv of errors='coerce'
. While skipping this column would be done with errors='coerce'
. (and of course raise
would raise.
Ideally would have a default of coerce
I think (to skip for non-compat values). Any thoughts on this?