-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: df[col] = arr should not overwrite data in df[col] #35417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 31 commits
a1ce4fc
b1913b7
e600237
140f5f2
c096c5d
cbd45e8
989ba97
bf6e5f5
a5ffd10
e126ab5
8c4f9f3
f716904
59e447a
09e89ee
11b8093
4271303
ed0af51
dffae7e
a7f363b
53992df
0a133d7
db1b668
858d6cb
3deb0a7
68e8715
3a74222
b0896be
f9498a3
d81a696
33ef890
ea49ae2
1a850c8
fed4782
0633cc6
a00702a
1dd58ab
4ffe7f4
247d0f8
9164c83
080c7e9
b47ed01
0290975
13b683d
7e9ea2d
bd55d67
ae9c707
4dd24ca
253f625
0b28fc0
9c5e6fa
2e0bf61
04af8fa
1d2b6d7
65d466b
13077e9
f013ec3
66d5ed0
f23245b
23e9462
6c8b15f
03b3015
687d262
9164a1e
02585c5
3731fc8
184f013
8503671
34a96f7
a82f62b
fe58441
28e6296
1e802b8
c1ab90f
382def7
4023a16
1e60537
072ef99
51fe3b2
7edd45a
9783bce
df6110e
d7257f2
1f9f9c3
36fb2d4
1292a92
9b127bf
831dc71
1d50325
d72f379
81e92d8
67435f8
ab833aa
017501e
0219f7f
032a55a
51d102b
bd6816b
908b57b
62f2437
36b3302
37c9d22
0862ece
38d8106
647a393
8b8d6a2
9c6e008
8285ece
270be1e
9cf69c9
065869a
ca260c5
5dcdc4a
a03dcc3
a1c5732
47b841a
df060a5
017b817
0821a60
608fb9d
a70fcc2
effd630
4e30881
a254ed0
df7b4d3
8f49c7f
a66afa0
fb95732
c0bace5
9e43fe9
f47fa2e
29d3bba
5b8de9a
79c7ae2
fa2dcea
0e8f671
3cdeeb4
acd3514
d97a1ac
4342f5d
f4dafc6
bebb12f
04475e3
ed6f3ec
fe9fe66
1ae50bf
9d32c62
8972875
6524331
123568d
8bef37a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -192,6 +192,90 @@ Other enhancements | |
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`) | ||
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`) | ||
|
||
.. --------------------------------------------------------------------------- | ||
|
||
.. whatsnew_120.notable_bug_fixes: | ||
|
||
Notable bug fixes | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
These are bug fixes that might have notable behavior changes. | ||
|
||
Assigning with ``DataFrame.__setitem__`` consistently creates a new array | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Assigning values with ``DataFrame.__setitem__`` now consistently assigns a new array, rather than mutating inplace (:issue:`33457`, :issue:`35271`, :issue:`35266`) | ||
|
||
Previously, ``DataFrame.__setitem__`` would sometimes operate inplace on the | ||
underlying array, and sometimes assign a new array. Fixing this inconsistency | ||
can have behavior-changing implications for workloads that relied on inplace | ||
mutation. The two most common cases are creating a ``DataFrame`` from an array | ||
and slicing a ``DataFrame``. | ||
|
||
*Previous Behavior* | ||
|
||
The array would be mutated inplace for some dtypes, like NumPy's ``int64`` dtype. | ||
|
||
.. code-block:: python | ||
|
||
>>> import pandas as pd | ||
>>> import numpy as np | ||
>>> a = np.array([1, 2, 3]) | ||
>>> df = pd.DataFrame(a, columns=['a']) | ||
>>> df['a'] = 0 | ||
>>> a # mutated inplace | ||
array([0, 0, 0]) | ||
|
||
But not others, like :class:`Int64Dtype`. | ||
|
||
.. code-block:: python | ||
|
||
>>> import pandas as pd | ||
>>> import numpy as np | ||
>>> a = pd.array([1, 2, 3], dtype="Int64") | ||
>>> df = pd.DataFrame(a, columns=['a']) | ||
>>> df['a'] = 0 | ||
>>> a # not mutated | ||
<IntegerArray> | ||
[1, 2, 3] | ||
Length: 3, dtype: Int64 | ||
|
||
|
||
*New Behavior* | ||
|
||
In pandas 1.1.0, ``DataFrame.__setitem__`` consistently sets on a new array rather than | ||
mutating the existing array inplace. | ||
|
||
.. ipython:: python | ||
|
||
For NumPy's int64 dtype | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move before the title (L399) |
||
|
||
import pandas as pd | ||
import numpy as np | ||
a = np.array([1, 2, 3]) | ||
df = pd.DataFrame(a, columns=['a']) | ||
df['a'] = 0 | ||
a # not mutated | ||
|
||
For :class:`Int64Dtype`. | ||
|
||
import pandas as pd | ||
import numpy as np | ||
a = pd.array([1, 2, 3], dtype="Int64") | ||
df = pd.DataFrame(a, columns=['a']) | ||
df['a'] = 0 | ||
a # not mutated | ||
|
||
This also affects cases where a second ``Series`` or ``DataFrame`` is a view on a first ``DataFrame``. | ||
|
||
.. code-block:: python | ||
|
||
df = pd.DataFrame({"A": [1, 2, 3]}) | ||
df2 = df[['A']] | ||
df['A'] = np.array([0, 0, 0]) | ||
|
||
Previously, whether ``df2`` was mutated depending on the dtype of the array being assigned to. Now, a | ||
new array is consistently assigned, so ``df2`` is not mutated. | ||
.. _whatsnew_120.api_breaking.python: | ||
|
||
Increased minimum version for Python | ||
|
@@ -389,6 +473,7 @@ Indexing | |
^^^^^^^^ | ||
|
||
- Bug in :meth:`PeriodIndex.get_loc` incorrectly raising ``ValueError`` on non-datelike strings instead of ``KeyError``, causing similar errors in :meth:`Series.__geitem__`, :meth:`Series.__contains__`, and :meth:`Series.loc.__getitem__` (:issue:`34240`) | ||
- Bug in :meth:`DataFrame.iloc.__setitem__` creating a new array instead of overwriting ``Categorical`` values in-place (:issue:`35417`) | ||
- Bug in :meth:`Index.sort_values` where, when empty values were passed, the method would break by trying to compare missing values instead of pushing them to the end of the sort order. (:issue:`35584`) | ||
- Bug in :meth:`Index.get_indexer` and :meth:`Index.get_indexer_non_unique` where int64 arrays are returned instead of intp. (:issue:`36359`) | ||
- Bug in :meth:`DataFrame.sort_index` where parameter ascending passed as a list on a single level index gives wrong result. (:issue:`32334`) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -684,7 +684,7 @@ def test_identity_slice_returns_new_object(self): | |
assert sliced_df is not original_df | ||
|
||
# should be a shallow copy | ||
original_df["a"] = [4, 4, 4] | ||
original_df.loc[:, "a"] = [4, 4, 4] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated both tests |
||
assert (sliced_df["a"] == 4).all() | ||
|
||
original_series = Series([1, 2, 3, 4, 5, 6]) | ||
|
@@ -708,8 +708,8 @@ def test_series_indexing_zerodim_np_array(self): | |
result = s.iloc[np.array(0)] | ||
assert result == 1 | ||
|
||
@pytest.mark.xfail(reason="https://github.com/pandas-dev/pandas/issues/33457") | ||
def test_iloc_setitem_categorical_updates_inplace(self): | ||
# GH#35417 | ||
# Mixed dtype ensures we go through take_split_path in setitem_with_indexer | ||
cat = pd.Categorical(["A", "B", "C"]) | ||
df = pd.DataFrame({1: cat, 2: [1, 2, 3]}) | ||
|
Uh oh!
There was an error while loading. Please reload this page.