Skip to content

ENH: enable setitem dim2 test to work for EA with complex128 dtype #54445

Open
@MichaelTiemannOSC

Description

@MichaelTiemannOSC

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I have read the threads related to DISCUSS/API: setitem-like operations should only update inplace (#39584) and friends (including #47577).

My problem arises from this test code from the Pint-Pandas test suite:

class TestSetitem(base.BaseSetitemTests):
    def test_setitem_2d_values(self, data):
        # GH50085                                                                                                                                                                                                                                                                                                                                                                                                       
        original = data.copy()
        df = pd.DataFrame({"a": data, "b": data})
	df.loc[[0, 1], :] = df.loc[[1, 0], :].values
        assert (df.loc[0, :] == original[1]).all()
        assert (df.loc[1, :] == original[0]).all()

As a result of PR #54441 I'm able to test Pint-Pandas with complex128 datatypes. PintArray EAs work perfectly with float and integer magnitudes, but fail in just this one case, with complex128. The failure starts in core/internals/managers.py in the fast_xs method:

        # GH#46406                                                                                                                                                                                                                                                                                                                                                                                                      
        immutable_ea = isinstance(dtype, ExtensionDtype) and dtype._is_immutable

        if isinstance(dtype, ExtensionDtype) and not immutable_ea:
            cls = dtype.construct_array_type()
            result = cls._empty((n,), dtype=dtype)

result becomes a PintArray backed by a FloatingArray :

<PintArray>
[<NA>, <NA>]
Length: 2, dtype: pint[nanometer]
(Pdb) result._data
<FloatingArray>
[<NA>, <NA>]
Length: 2, dtype: Float64

The FloatingArray comes when the PintArray initializer finds nothing helpful in either dtype nor values and falls back to creating a pd.array(values, ...):

    def __init__(self, values, dtype=None, copy=False):
        if dtype is None:
            if isinstance(values, _Quantity):
                dtype = values.units
            elif isinstance(values, PintArray):
                dtype = values._dtype
        if dtype is None:
            raise NotImplementedError

        if not isinstance(dtype, PintType):
            dtype = PintType(dtype)
        self._dtype = dtype

        if isinstance(values, _Quantity):
            values = values.to(dtype.units).magnitude
        elif isinstance(values, PintArray):
            values = values._data
        if isinstance(values, np.ndarray):
            dtype = values.dtype
            if dtype in dtypemap:
                dtype = dtypemap[dtype]
            values = pd.array(values, copy=copy, dtype=dtype)
            copy = False
        elif not isinstance(values, pd.core.arrays.numeric.NumericArray):
            values = pd.array(values, copy=copy)
        if copy:
            values = values.copy()
        self._data = values
        self._Q = self.dtype.ureg.Quantity

The fast_xs fails when result[rl] is not ready to accept the complex128 data coming from blk.iget((i, loc)):

        for blk in self.blocks:
            # Such assignment may incorrectly coerce NaT to None                                                                                                                                                                                                                                                                                                                                                        
            # result[blk.mgr_locs] = blk._slice((slice(None), loc))                                                                                                                                                                                                                                                                                                                                                     
            for i, rl in enumerate(blk.mgr_locs):
                result[rl] = blk.iget((i, loc))

As I see it, the problem is that we commit too soon to building our backing array with too-limited information.

@andrewgsavage
@topper-123
@jbrockmendel
@mroeschke

Feature Description

I will point out for the record:

  • values = [v[loc] for v in self.arrays]

So we have everything we need within the environment of fast_xs. Should we use this knowledge as power to create a result that can hold slices of data beyond float64? Here's code that tries to use the fast path, but if an exception is raised, it does the sure thing:

        try:
            for blk in self.blocks:
                # Such assignment may incorrectly coerce NaT to None                                                                                                                                                                                                                                                                                                                                                    
                # result[blk.mgr_locs] = blk._slice((slice(None), loc))                                                                                                                                                                                                                                                                                                                                                 
                for i, rl in enumerate(blk.mgr_locs):
                    result[rl] = blk.iget((i, loc))
        except TypeError:
            if isinstance(dtype, ExtensionDtype) and not immutable_ea:
		values = [v[loc] for v in self.arrays]
                result = cls._from_sequence(values, dtype)
            else:
                raise TypeError

Alternative Solutions

I'm open to alternative solutions, but the above actually causes the test case to pass. Should I submit a PR?

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ComplexComplex NumbersEnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Testingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions