Skip to content

Commit 792e290

Browse files
authored
PDEP6 implementation pt 2: eablock.setitem, eablock.putmask (#53405)
* pt1 * fixup test collection * fixup warnings * add comments * fixup warnings * fixup test_indexing * fixup test_set_value * fixup test_where * fixup test_asof * add one more explicit upcast * fixup test_update * fixup test_constructors * fixup test_stack_unstack * catch warnings in test_update_dtypes * fixup all test_update * start fixing up setitem * finish fixing up test_setitem * more fixups * catch numpy-dev warning * fixup some more * fixup test_indexing * fixup test_function * fixup test_multi; * fixup test_base * fixup test_impl * fixup multiindex/test_setitem * fixup test_scalar * fixup test_loc * fixup test_iloc * fixup test_at * fixup test_groupby * fixup some doc warnings * post-merge fixup * change dtype in doctest * fixup doctest * explicit cast in test * fixup test for COW * fixup COW * catch warnings in testsetitemcastingequivalents * wip * fixup setitem test int key! * getting there! * fixup test_setitem * getting there * fixup remaining warnings * fix test_update * fixup some failing test * one more * simplify * simplify and remove some false-positives * clean up * remove final filterwarnings * undo unrelated change * fixup raises_chained_assignment_error * remove another filterwarnings * fixup interchange test * better parametrisation * okwarning => codeblock * okwarning => codeblock in v1.3.0 * one more codeblock * avoid upcast * post-merge fixup * docs fixup; * post-merge fixup * remove more upcasts * adapt test from EA types * move test to series/indexing * add tests about warnings * fixup tests * add dataframe tests too * fixup tests * simplify * try-fix docs build * post-merge fixup * raise assertionerror if self.dtype equals new_dtype * add todo for test case which should warn * add more todos * post-merge fixup * fixup setitem * fixup * wip fixup * wip fixup * another fixup * add whatsnew * list examples of operations --------- Co-authored-by: MarcoGorelli <>
1 parent ac85de8 commit 792e290

File tree

14 files changed

+182
-45
lines changed

14 files changed

+182
-45
lines changed

doc/source/whatsnew/v0.21.0.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -783,10 +783,15 @@ non-datetime-like item being assigned (:issue:`14145`).
783783
784784
These now coerce to ``object`` dtype.
785785

786-
.. ipython:: python
786+
.. code-block:: python
787787
788-
s[1] = 1
789-
s
788+
In [1]: s[1] = 1
789+
790+
In [2]: s
791+
Out[2]:
792+
0 2011-01-01 00:00:00
793+
1 1
794+
dtype: object
790795
791796
- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
792797
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)

doc/source/whatsnew/v2.1.0.rst

Lines changed: 94 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,99 @@ Other API changes
296296
.. ---------------------------------------------------------------------------
297297
.. _whatsnew_210.deprecations:
298298

299-
Deprecate parsing datetimes with mixed time zones
300-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
299+
Deprecations
300+
~~~~~~~~~~~~
301+
302+
Deprecated silent upcasting in setitem-like Series operations
303+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
304+
305+
Setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are
306+
deprecated and show a warning. Examples of affected operations are:
307+
308+
- ``ser.fillna('foo', inplace=True)``
309+
- ``ser.where(ser.isna(), 'foo', inplace=True)``
310+
- ``ser.iloc[indexer] = 'foo'``
311+
- ``ser.loc[indexer] = 'foo'``
312+
- ``df.iloc[indexer, 0] = 'foo'``
313+
- ``df.loc[indexer, 'a'] = 'foo'``
314+
- ``ser[indexer] = 'foo'``
315+
316+
where ``ser`` is a :class:`Series`, ``df`` is a :class:`DataFrame`, and ``indexer``
317+
could be a slice, a mask, a single value, a list or array of values, or any other
318+
allowed indexer.
319+
320+
In a future version, these will raise an error and you should cast to a common dtype first.
321+
322+
*Previous behavior*:
323+
324+
.. code-block:: ipython
325+
326+
In [1]: ser = pd.Series([1, 2, 3])
327+
328+
In [2]: ser
329+
Out[2]:
330+
0 1
331+
1 2
332+
2 3
333+
dtype: int64
334+
335+
In [3]: ser[0] = 'not an int64'
336+
337+
In [4]: ser
338+
Out[4]:
339+
0 not an int64
340+
1 2
341+
2 3
342+
dtype: object
343+
344+
*New behavior*:
345+
346+
.. code-block:: ipython
347+
348+
In [1]: ser = pd.Series([1, 2, 3])
349+
350+
In [2]: ser
351+
Out[2]:
352+
0 1
353+
1 2
354+
2 3
355+
dtype: int64
356+
357+
In [3]: ser[0] = 'not an int64'
358+
FutureWarning:
359+
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas.
360+
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
361+
362+
In [4]: ser
363+
Out[4]:
364+
0 not an int64
365+
1 2
366+
2 3
367+
dtype: object
368+
369+
To retain the current behaviour, in the case above you could cast ``ser`` to ``object`` dtype first:
370+
371+
.. ipython:: python
372+
373+
ser = pd.Series([1, 2, 3])
374+
ser = ser.astype('object')
375+
ser[0] = 'not an int64'
376+
ser
377+
378+
Depending on the use-case, it might be more appropriate to cast to a different dtype.
379+
In the following, for example, we cast to ``float64``:
380+
381+
.. ipython:: python
382+
383+
ser = pd.Series([1, 2, 3])
384+
ser = ser.astype('float64')
385+
ser[0] = 1.1
386+
ser
387+
388+
For further reading, please see https://pandas.pydata.org/pdeps/0006-ban-upcasting.html.
389+
390+
Deprecated parsing datetimes with mixed time zones
391+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
301392

302393
Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`50887`)
303394

@@ -342,7 +433,7 @@ and ``datetime.datetime.strptime``:
342433
pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))
343434
344435
Other Deprecations
345-
~~~~~~~~~~~~~~~~~~
436+
^^^^^^^^^^^^^^^^^^
346437
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
347438
- Deprecated 'downcast' keyword in :meth:`Index.fillna` (:issue:`53956`)
348439
- Deprecated 'fill_method' and 'limit' keywords in :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`DataFrameGroupBy.pct_change`, and :meth:`SeriesGroupBy.pct_change`, explicitly call ``ffill`` or ``bfill`` before calling ``pct_change`` instead (:issue:`53491`)

pandas/core/internals/blocks.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
461461
FutureWarning,
462462
stacklevel=find_stack_level(),
463463
)
464-
if self.dtype == new_dtype:
464+
if self.values.dtype == new_dtype:
465465
raise AssertionError(
466466
f"Did not expect new dtype {new_dtype} to equal self.dtype "
467467
f"{self.values.dtype}. Please report a bug at "
@@ -1723,11 +1723,11 @@ def setitem(self, indexer, value, using_cow: bool = False):
17231723

17241724
if isinstance(self.dtype, IntervalDtype):
17251725
# see TestSetitemFloatIntervalWithIntIntervalValues
1726-
nb = self.coerce_to_target_dtype(orig_value)
1726+
nb = self.coerce_to_target_dtype(orig_value, warn_on_upcast=True)
17271727
return nb.setitem(orig_indexer, orig_value)
17281728

17291729
elif isinstance(self, NDArrayBackedExtensionBlock):
1730-
nb = self.coerce_to_target_dtype(orig_value)
1730+
nb = self.coerce_to_target_dtype(orig_value, warn_on_upcast=True)
17311731
return nb.setitem(orig_indexer, orig_value)
17321732

17331733
else:
@@ -1841,13 +1841,13 @@ def putmask(self, mask, new, using_cow: bool = False) -> list[Block]:
18411841
if isinstance(self.dtype, IntervalDtype):
18421842
# Discussion about what we want to support in the general
18431843
# case GH#39584
1844-
blk = self.coerce_to_target_dtype(orig_new)
1844+
blk = self.coerce_to_target_dtype(orig_new, warn_on_upcast=True)
18451845
return blk.putmask(orig_mask, orig_new)
18461846

18471847
elif isinstance(self, NDArrayBackedExtensionBlock):
18481848
# NB: not (yet) the same as
18491849
# isinstance(values, NDArrayBackedExtensionArray)
1850-
blk = self.coerce_to_target_dtype(orig_new)
1850+
blk = self.coerce_to_target_dtype(orig_new, warn_on_upcast=True)
18511851
return blk.putmask(orig_mask, orig_new)
18521852

18531853
else:

pandas/tests/frame/indexing/test_indexing.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -831,7 +831,10 @@ def test_setitem_single_column_mixed_datetime(self):
831831
tm.assert_series_equal(result, expected)
832832

833833
# GH#16674 iNaT is treated as an integer when given by the user
834-
df.loc["b", "timestamp"] = iNaT
834+
with tm.assert_produces_warning(
835+
FutureWarning, match="Setting an item of incompatible dtype"
836+
):
837+
df.loc["b", "timestamp"] = iNaT
835838
assert not isna(df.loc["b", "timestamp"])
836839
assert df["timestamp"].dtype == np.object_
837840
assert df.loc["b", "timestamp"] == iNaT
@@ -862,7 +865,10 @@ def test_setitem_mixed_datetime(self):
862865
df = DataFrame(0, columns=list("ab"), index=range(6))
863866
df["b"] = pd.NaT
864867
df.loc[0, "b"] = datetime(2012, 1, 1)
865-
df.loc[1, "b"] = 1
868+
with tm.assert_produces_warning(
869+
FutureWarning, match="Setting an item of incompatible dtype"
870+
):
871+
df.loc[1, "b"] = 1
866872
df.loc[[2, 3], "b"] = "x", "y"
867873
A = np.array(
868874
[

pandas/tests/frame/indexing/test_where.py

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -735,7 +735,10 @@ def test_where_interval_fullop_downcast(self, frame_or_series):
735735
tm.assert_equal(res, other.astype(np.int64))
736736

737737
# unlike where, Block.putmask does not downcast
738-
obj.mask(obj.notna(), other, inplace=True)
738+
with tm.assert_produces_warning(
739+
FutureWarning, match="Setting an item of incompatible dtype"
740+
):
741+
obj.mask(obj.notna(), other, inplace=True)
739742
tm.assert_equal(obj, other.astype(object))
740743

741744
@pytest.mark.parametrize(
@@ -775,7 +778,10 @@ def test_where_datetimelike_noop(self, dtype):
775778
tm.assert_frame_equal(res5, expected)
776779

777780
# unlike where, Block.putmask does not downcast
778-
df.mask(~mask2, 4, inplace=True)
781+
with tm.assert_produces_warning(
782+
FutureWarning, match="Setting an item of incompatible dtype"
783+
):
784+
df.mask(~mask2, 4, inplace=True)
779785
tm.assert_frame_equal(df, expected.astype(object))
780786

781787

@@ -930,7 +936,10 @@ def test_where_period_invalid_na(frame_or_series, as_cat, request):
930936
result = obj.mask(mask, tdnat)
931937
tm.assert_equal(result, expected)
932938

933-
obj.mask(mask, tdnat, inplace=True)
939+
with tm.assert_produces_warning(
940+
FutureWarning, match="Setting an item of incompatible dtype"
941+
):
942+
obj.mask(mask, tdnat, inplace=True)
934943
tm.assert_equal(obj, expected)
935944

936945

@@ -1006,7 +1015,10 @@ def test_where_dt64_2d():
10061015

10071016
# setting all of one column, none of the other
10081017
expected = DataFrame({"A": other[:, 0], "B": dta[:, 1]})
1009-
_check_where_equivalences(df, mask, other, expected)
1018+
with tm.assert_produces_warning(
1019+
FutureWarning, match="Setting an item of incompatible dtype"
1020+
):
1021+
_check_where_equivalences(df, mask, other, expected)
10101022

10111023
# setting part of one column, none of the other
10121024
mask[1, 0] = True
@@ -1016,7 +1028,10 @@ def test_where_dt64_2d():
10161028
"B": dta[:, 1],
10171029
}
10181030
)
1019-
_check_where_equivalences(df, mask, other, expected)
1031+
with tm.assert_produces_warning(
1032+
FutureWarning, match="Setting an item of incompatible dtype"
1033+
):
1034+
_check_where_equivalences(df, mask, other, expected)
10201035

10211036
# setting nothing in either column
10221037
mask[:] = True

pandas/tests/frame/test_constructors.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2549,8 +2549,7 @@ def check_views(c_only: bool = False):
25492549
check_views()
25502550

25512551
# TODO: most of the rest of this test belongs in indexing tests
2552-
# TODO: 'm' and 'M' should warn
2553-
if lib.is_np_dtype(df.dtypes.iloc[0], "fciuOmM"):
2552+
if lib.is_np_dtype(df.dtypes.iloc[0], "fciuO"):
25542553
warn = None
25552554
else:
25562555
warn = FutureWarning

pandas/tests/indexing/test_at.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@
2323
def test_at_timezone():
2424
# https://github.com/pandas-dev/pandas/issues/33544
2525
result = DataFrame({"foo": [datetime(2000, 1, 1)]})
26-
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
26+
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
27+
result.at[0, "foo"] = datetime(2000, 1, 2, tzinfo=timezone.utc)
2728
expected = DataFrame(
2829
{"foo": [datetime(2000, 1, 2, tzinfo=timezone.utc)]}, dtype=object
2930
)

pandas/tests/indexing/test_loc.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1449,7 +1449,8 @@ def test_loc_setitem_datetime_coercion(self):
14491449
df.loc[0:1, "c"] = np.datetime64("2008-08-08")
14501450
assert Timestamp("2008-08-08") == df.loc[0, "c"]
14511451
assert Timestamp("2008-08-08") == df.loc[1, "c"]
1452-
df.loc[2, "c"] = date(2005, 5, 5)
1452+
with tm.assert_produces_warning(FutureWarning, match="incompatible dtype"):
1453+
df.loc[2, "c"] = date(2005, 5, 5)
14531454
assert Timestamp("2005-05-05").date() == df.loc[2, "c"]
14541455

14551456
@pytest.mark.parametrize("idxer", ["var", ["var"]])

pandas/tests/internals/test_internals.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1312,17 +1312,20 @@ def test_interval_can_hold_element(self, dtype, element):
13121312
# `elem` to not have the same length as `arr`
13131313
ii2 = IntervalIndex.from_breaks(arr[:-1], closed="neither")
13141314
elem = element(ii2)
1315-
self.check_series_setitem(elem, ii, False)
1315+
with tm.assert_produces_warning(FutureWarning):
1316+
self.check_series_setitem(elem, ii, False)
13161317
assert not blk._can_hold_element(elem)
13171318

13181319
ii3 = IntervalIndex.from_breaks([Timestamp(1), Timestamp(3), Timestamp(4)])
13191320
elem = element(ii3)
1320-
self.check_series_setitem(elem, ii, False)
1321+
with tm.assert_produces_warning(FutureWarning):
1322+
self.check_series_setitem(elem, ii, False)
13211323
assert not blk._can_hold_element(elem)
13221324

13231325
ii4 = IntervalIndex.from_breaks([Timedelta(1), Timedelta(3), Timedelta(4)])
13241326
elem = element(ii4)
1325-
self.check_series_setitem(elem, ii, False)
1327+
with tm.assert_produces_warning(FutureWarning):
1328+
self.check_series_setitem(elem, ii, False)
13261329
assert not blk._can_hold_element(elem)
13271330

13281331
def test_period_can_hold_element_emptylist(self):
@@ -1341,11 +1344,13 @@ def test_period_can_hold_element(self, element):
13411344
# `elem` to not have the same length as `arr`
13421345
pi2 = pi.asfreq("D")[:-1]
13431346
elem = element(pi2)
1344-
self.check_series_setitem(elem, pi, False)
1347+
with tm.assert_produces_warning(FutureWarning):
1348+
self.check_series_setitem(elem, pi, False)
13451349

13461350
dti = pi.to_timestamp("S")[:-1]
13471351
elem = element(dti)
1348-
self.check_series_setitem(elem, pi, False)
1352+
with tm.assert_produces_warning(FutureWarning):
1353+
self.check_series_setitem(elem, pi, False)
13491354

13501355
def check_can_hold_element(self, obj, elem, inplace: bool):
13511356
blk = obj._mgr.blocks[0]

0 commit comments

Comments
 (0)