DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype)

Currently, setitem-like operations (i.e. operations that change values in an existing series or dataframe such as `__setitem__` and `.loc`/`.iloc` setitem, or filling methods like `fillna`) first try to update in place, but if there is a dtype mismatch, pandas will upcast to a common dtype (typically object dtype). 

For example, setting a string into an integer Series upcasts to object:

```python
>>> s = pd.Series([1, 2, 3])
>>> s.loc[1] = "B"
>>> s
0    1
1    B
2    3
dtype: object
```

or doing a `fillna` with an invalid fill value also upcasts instead of raising an error:

```python
>>> s = pd.Series(["2020-01-01", "NaT"], dtype="datetime64[ns]")
>>> s
0   2020-01-01
1          NaT
dtype: datetime64[ns]
>>> s.fillna(1)
0    2020-01-01 00:00:00
1                      1
dtype: object
```

----

My **general proposal** would be that in some future (eg pandas 2.0 + after a deprecation), such inherently inplace operation should have the guarantee to either happen in place or either error, and thus never change the dtype of the original Series/DataFrame.

This is similar to eg numpy's behaviour where setitem never changes the dtype. Showing the first example from above in equivalent numpy code:

```python
>>> arr = np.array([1, 2, 3])
>>> arr[1] = "B"
...
ValueError: invalid literal for int() with base 10: 'B'
```

Apart from that, I also think this is the cleaner behaviour with less surprises. If a user specifically wants to allow mixed types in a column, they can manually cast to `object` dtype first.  

On the other hand, this is quite a big change in how we generally are permissive right now and easily upcast, and such a change will certainly impact quite some user code (but, it's perfectly possible to do this with proper deprecation warnings in advance warning for the specific cases where it will error in the future AFAIK).

There are certainly some more details that need to discussed as well if we want this (which exact values are regarded as compatible with the dtype, eg setting a float in an integer column, should that error or silently round the float?). But what are people's thoughts on the general idea?

cc @pandas-dev/pandas-core 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DISCUSS/API: setitem-like operations should only update inplace and never fallback with upcast (i.e never change the dtype) #39584

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions