Skip to content

BUG: wrong future Warning on string assignment in certain condition #57402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Fixed regressions
- Fixed regression in :meth:`CategoricalIndex.difference` raising ``KeyError`` when other contains null values other than NaN (:issue:`57318`)
- Fixed regression in :meth:`DataFrame.groupby` raising ``ValueError`` when grouping by a :class:`Series` in some cases (:issue:`57276`)
- Fixed regression in :meth:`DataFrame.loc` raising ``IndexError`` for non-unique, masked dtype indexes where result has more than 10,000 rows (:issue:`57027`)
- Fixed regression in :meth:`DataFrame.loc` which was unnecessarily throwing "incompatible dtype warning" when expanding with partial row indexer and multiple columns (see `PDEP6 <https://pandas.pydata.org/pdeps/0006-ban-upcasting.html>`_) (:issue:`56503`)
- Fixed regression in :meth:`DataFrame.merge` raising ``ValueError`` for certain types of 3rd-party extension arrays (:issue:`57316`)
- Fixed regression in :meth:`DataFrame.shift` raising ``AssertionError`` for ``axis=1`` and empty :class:`DataFrame` (:issue:`57301`)
- Fixed regression in :meth:`DataFrame.sort_index` not producing a stable sort for a index with duplicates (:issue:`57151`)
Expand Down
15 changes: 13 additions & 2 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -854,7 +854,6 @@ def _ensure_listlike_indexer(self, key, axis=None, value=None) -> None:
if self.ndim != 2:
return

orig_key = key
if isinstance(key, tuple) and len(key) > 1:
# key may be a tuple if we are .loc
# if length of key is > 1 set key to column part
Expand All @@ -872,7 +871,7 @@ def _ensure_listlike_indexer(self, key, axis=None, value=None) -> None:
keys = self.obj.columns.union(key, sort=False)
diff = Index(key).difference(self.obj.columns, sort=False)

if len(diff) and com.is_null_slice(orig_key[0]):
if len(diff):
# e.g. if we are doing df.loc[:, ["A", "B"]] = 7 and "B"
# is a new column, add the new columns with dtype=np.void
# so that later when we go through setitem_single_column
Expand Down Expand Up @@ -2165,6 +2164,18 @@ def _setitem_single_column(self, loc: int, value, plane_indexer) -> None:
else:
# set value into the column (first attempting to operate inplace, then
# falling back to casting if necessary)
dtype = self.obj.dtypes.iloc[loc]
if dtype == np.void:
# This means we're expanding, with multiple columns, e.g.
# df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]})
# df.loc[df.index <= 2, ['F', 'G']] = (1, 'abc')
# Columns F and G will initially be set to np.void.
# Here, we replace those temporary `np.void` columns with
# columns of the appropriate dtype, based on `value`.
arr = sanitize_array(value, Index(range(1)), copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct_1d_arraylike_from_scalar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed during the dev meeting. If the idea is to fill with the appropriate missing value, I think you can use a combination of infer_fill_value with construct_1d_arraylike_from_scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! will update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would run into the same issues that #56321 addressed

I'd factored out the common logic, if there's a better way perhaps it can be addressed separately?

taker = -1 * np.ones(len(self.obj), dtype=np.intp)
empty_value = algos.take_nd(arr, taker)
self.obj.iloc[:, loc] = empty_value
self.obj._mgr.column_setitem(loc, plane_indexer, value)

def _setitem_single_block(self, indexer, value, name: str) -> None:
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1369,3 +1369,19 @@ def test_full_setter_loc_incompatible_dtype():
df.loc[:, "a"] = {0: 3, 1: 4}
expected = DataFrame({"a": [3, 4]})
tm.assert_frame_equal(df, expected)


def test_setitem_partial_row_multiple_columns():
# https://github.com/pandas-dev/pandas/issues/56503
df = DataFrame({"A": [1, 2, 3], "B": [4.0, 5, 6]})
# should not warn
df.loc[df.index <= 1, ["F", "G"]] = (1, "abc")
expected = DataFrame(
{
"A": [1, 2, 3],
"B": [4.0, 5, 6],
"F": [1.0, 1, float("nan")],
"G": ["abc", "abc", float("nan")],
}
)
tm.assert_frame_equal(df, expected)