Skip to content

[ArrayManager] BUG: fix setitem with non-aligned boolean dataframe #39539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,3 +157,4 @@ jobs:
run: |
source activate pandas-dev
pytest pandas/tests/frame/methods --array-manager
pytest pandas/tests/frame/indexing/test_indexing.py::TestDataFrameIndexing::test_setitem_boolean --array-manager
4 changes: 4 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,10 @@ def _as_manager(self, typ: str) -> DataFrame:
# fastpath of passing a manager doesn't check the option/manager class
return DataFrame(new_mgr)

@property
def _has_array_manager(self):
return isinstance(self._mgr, ArrayManager)

# ----------------------------------------------------------------------

@property
Expand Down
15 changes: 15 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8817,6 +8817,11 @@ def _where(
if axis is not None:
axis = self._get_axis_number(axis)

# Needed for DataFrames with ArrayManager, see below for details
all_bool_columns = False
if isinstance(cond, ABCDataFrame) and cond._has_array_manager:
all_bool_columns = all(is_bool_dtype(dt) for dt in cond.dtypes)

# align the cond to same shape as myself
cond = com.apply_if_callable(cond, self)
if isinstance(cond, NDFrame):
Expand All @@ -8832,6 +8837,16 @@ def _where(
fill_value = bool(inplace)
cond = cond.fillna(fill_value)

# With ArrayManager, `fillna` does not automatically change object dtype
# back to bools (if the alignment made it object by introducing NaNs).
# So in this case we cast back to bool manually *if* the original columns
# before aligning were bool
# TODO this workaround can be removed once we have nullable boolean dtype
# as default
if isinstance(cond, ABCDataFrame) and cond._has_array_manager:
if not all(is_bool_dtype(dt) for dt in cond.dtypes) and all_bool_columns:
cond = cond.astype(bool)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a viable way to put these patches in with the ArrayManager code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't directly see one, as we don't want to change (IMO) the fillna behaviour.
Unless we we would add a keyword for this (only for internal use) to the manager fillna method, and call cond._mgr.fillna here instead of cond.fillna to be able to pass it through, but not sure that would be any cleaner (as long as it is only used once, at least, there might come up other places where this is needed later)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left same comment above. I think we are going to want to push more code to the manager (kind of the opposite that we have been doing of late though).

msg = "Boolean array expected for the condition, not {dtype}"

if not cond.empty:
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/frame/methods/test_fillna.py
Original file line number Diff line number Diff line change
Expand Up @@ -544,3 +544,14 @@ def test_fillna_nonconsolidated_frame():
df_nonconsol = df.pivot("i1", "i2")
result = df_nonconsol.fillna(0)
assert result.isna().sum().sum() == 0


def test_fillna_infer_bool_dtype(using_array_manager):
# With ArrayManager, fillna doesn't change/infer dtypes
df = DataFrame([[True, False], [np.nan, True], [False, None]], dtype=object)
result = df.fillna(True)
if using_array_manager:
expected = DataFrame([[True, False], [True, True], [False, True]], dtype=object)
else:
expected = DataFrame([[True, False], [True, True], [False, True]], dtype=bool)
tm.assert_frame_equal(result, expected)