Closed
Description
In #38746 while implementing to_numeric
for ExtensionArrays I do not copy the mask of the original input. This means that the (potentially) cast array shares a mask with the input. We should copy the mask.
In [9]: import pandas as pd
...: import pandas._testing as tm
...:
...: arr = pd.array([1, 2, pd.NA], dtype="Int64")
...:
...: result = pd.to_numeric(arr, downcast="integer")
...: expected = pd.array([1, 2, pd.NA], dtype="Int8")
...: tm.assert_extension_array_equal(result, expected)
...:
...: arr[1] = pd.NA # should not modify result
...: tm.assert_extension_array_equal(result, expected)
...:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-9-f72c43e18273> in <module>
9
10 arr[1] = pd.NA
---> 11 tm.assert_extension_array_equal(result, expected)
~/repos/pandas/pandas/_testing/asserters.py in assert_extension_array_equal(left, right, check_dtype, index_values, check_less_precise, check_exact, rtol, atol)
794 left_na = np.asarray(left.isna())
795 right_na = np.asarray(right.isna())
--> 796 assert_numpy_array_equal(
797 left_na, right_na, obj="ExtensionArray NA mask", index_values=index_values
798 )
[... skipping hidden 1 frame]
~/repos/pandas/pandas/_testing/asserters.py in _raise(left, right, err_msg)
699 diff = diff * 100.0 / left.size
700 msg = f"{obj} values are different ({np.round(diff, 5)} %)"
--> 701 raise_assert_detail(obj, msg, left, right, index_values=index_values)
702
703 raise AssertionError(err_msg)
~/repos/pandas/pandas/_testing/asserters.py in raise_assert_detail(obj, message, left, right, diff, index_values)
629 msg += f"\n[diff]: {diff}"
630
--> 631 raise AssertionError(msg)
632
633
AssertionError: ExtensionArray NA mask are different
ExtensionArray NA mask values are different (33.33333 %)
[left]: [False, True, True]
[right]: [False, False, True]
Thanks @jorisvandenbossche for pointing this out!