Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I have read the threads related to DISCUSS/API: setitem-like operations should only update inplace (#39584) and friends (including #47577).
My problem arises from this test code from the Pint-Pandas test suite:
class TestSetitem(base.BaseSetitemTests):
def test_setitem_2d_values(self, data):
# GH50085
original = data.copy()
df = pd.DataFrame({"a": data, "b": data})
df.loc[[0, 1], :] = df.loc[[1, 0], :].values
assert (df.loc[0, :] == original[1]).all()
assert (df.loc[1, :] == original[0]).all()
As a result of PR #54441 I'm able to test Pint-Pandas with complex128 datatypes. PintArray EAs work perfectly with float and integer magnitudes, but fail in just this one case, with complex128. The failure starts in core/internals/managers.py
in the fast_xs
method:
# GH#46406
immutable_ea = isinstance(dtype, ExtensionDtype) and dtype._is_immutable
if isinstance(dtype, ExtensionDtype) and not immutable_ea:
cls = dtype.construct_array_type()
result = cls._empty((n,), dtype=dtype)
result
becomes a PintArray backed by a FloatingArray :
<PintArray>
[<NA>, <NA>]
Length: 2, dtype: pint[nanometer]
(Pdb) result._data
<FloatingArray>
[<NA>, <NA>]
Length: 2, dtype: Float64
The FloatingArray comes when the PintArray
initializer finds nothing helpful in either dtype
nor values
and falls back to creating a pd.array(values, ...)
:
def __init__(self, values, dtype=None, copy=False):
if dtype is None:
if isinstance(values, _Quantity):
dtype = values.units
elif isinstance(values, PintArray):
dtype = values._dtype
if dtype is None:
raise NotImplementedError
if not isinstance(dtype, PintType):
dtype = PintType(dtype)
self._dtype = dtype
if isinstance(values, _Quantity):
values = values.to(dtype.units).magnitude
elif isinstance(values, PintArray):
values = values._data
if isinstance(values, np.ndarray):
dtype = values.dtype
if dtype in dtypemap:
dtype = dtypemap[dtype]
values = pd.array(values, copy=copy, dtype=dtype)
copy = False
elif not isinstance(values, pd.core.arrays.numeric.NumericArray):
values = pd.array(values, copy=copy)
if copy:
values = values.copy()
self._data = values
self._Q = self.dtype.ureg.Quantity
The fast_xs fails when result[rl]
is not ready to accept the complex128 data coming from blk.iget((i, loc))
:
for blk in self.blocks:
# Such assignment may incorrectly coerce NaT to None
# result[blk.mgr_locs] = blk._slice((slice(None), loc))
for i, rl in enumerate(blk.mgr_locs):
result[rl] = blk.iget((i, loc))
As I see it, the problem is that we commit too soon to building our backing array with too-limited information.
@andrewgsavage
@topper-123
@jbrockmendel
@mroeschke
Feature Description
I will point out for the record:
- values = [v[loc] for v in self.arrays]
So we have everything we need within the environment of fast_xs. Should we use this knowledge as power to create a result
that can hold slices of data beyond float64? Here's code that tries to use the fast path, but if an exception is raised, it does the sure thing:
try:
for blk in self.blocks:
# Such assignment may incorrectly coerce NaT to None
# result[blk.mgr_locs] = blk._slice((slice(None), loc))
for i, rl in enumerate(blk.mgr_locs):
result[rl] = blk.iget((i, loc))
except TypeError:
if isinstance(dtype, ExtensionDtype) and not immutable_ea:
values = [v[loc] for v in self.arrays]
result = cls._from_sequence(values, dtype)
else:
raise TypeError
Alternative Solutions
I'm open to alternative solutions, but the above actually causes the test case to pass. Should I submit a PR?
Additional Context
No response