Skip to content

BUG: pd.to_numeric does not copy _mask for ExtensionArrays #38974

Closed
@arw2019

Description

@arw2019

In #38746 while implementing to_numeric for ExtensionArrays I do not copy the mask of the original input. This means that the (potentially) cast array shares a mask with the input. We should copy the mask.

In [9]: import pandas as pd
   ...: import pandas._testing as tm
   ...: 
   ...: arr = pd.array([1, 2, pd.NA], dtype="Int64")
   ...: 
   ...: result = pd.to_numeric(arr, downcast="integer")
   ...: expected = pd.array([1, 2, pd.NA], dtype="Int8")
   ...: tm.assert_extension_array_equal(result, expected)
   ...: 
   ...: arr[1] = pd.NA # should not modify result
   ...: tm.assert_extension_array_equal(result, expected)
   ...: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-f72c43e18273> in <module>
      9 
     10 arr[1] = pd.NA
---> 11 tm.assert_extension_array_equal(result, expected)

~/repos/pandas/pandas/_testing/asserters.py in assert_extension_array_equal(left, right, check_dtype, index_values, check_less_precise, check_exact, rtol, atol)
    794     left_na = np.asarray(left.isna())
    795     right_na = np.asarray(right.isna())
--> 796     assert_numpy_array_equal(
    797         left_na, right_na, obj="ExtensionArray NA mask", index_values=index_values
    798     )

    [... skipping hidden 1 frame]

~/repos/pandas/pandas/_testing/asserters.py in _raise(left, right, err_msg)
    699             diff = diff * 100.0 / left.size
    700             msg = f"{obj} values are different ({np.round(diff, 5)} %)"
--> 701             raise_assert_detail(obj, msg, left, right, index_values=index_values)
    702 
    703         raise AssertionError(err_msg)

~/repos/pandas/pandas/_testing/asserters.py in raise_assert_detail(obj, message, left, right, diff, index_values)
    629         msg += f"\n[diff]: {diff}"
    630 
--> 631     raise AssertionError(msg)
    632 
    633 

AssertionError: ExtensionArray NA mask are different

ExtensionArray NA mask values are different (33.33333 %)
[left]:  [False, True, True]
[right]: [False, False, True]

Thanks @jorisvandenbossche for pointing this out!

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arrays

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions