Skip to content

BUG: Implementation of is_list_like causes TypeError when working with Extension Arrays of ambiguous type #43195

Closed
@dcoukos

Description

@dcoukos
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pint
import pandas as pd
import pint_pandas
import numpy as np
import datetime

u = pint.UnitRegistry()
Q = pint.Quantity(np.random.rand(20), u.hertz)
df = pd.DataFrame(Q , dtype='pint[hertz]')
limit1 = -0.4*u.hertz
limit2 = 4*u.hertz

df.clip(limit1, limit2)

Problem description

This makes it difficult for downstream libraries to correctly advertise their type when they must be able to accommodate array-like data or scalar date. In addition, having dtype information (important for pint in particular because it handles physical units) makes this type of class evaluate true for is_array_like even when it is a scalar, due to is_array_like using is_list_like.

Since Quantity does not know what type of data it will receive, it has an iter function. Calling this function - and thus allowing it to advertise whether it has scalar or array-like data - instead of checking for the presence of an iter function would make is_list_like more robust.

Traceback
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-13539a13f5ad> in <module>
----> 1 df.diff().clip(limit1, limit2)

~/opt/anaconda3/envs/aug20/lib/python3.9/site-packages/pandas/core/generic.py in clip(self, lower, upper, axis, inplace, *args, **kwargs)
   7713         result = self
   7714         if lower is not None:
-> 7715             result = result._clip_with_one_bound(
   7716                 lower, method=self.ge, axis=axis, inplace=inplace
   7717             )

~/opt/anaconda3/envs/aug20/lib/python3.9/site-packages/pandas/core/generic.py in _clip_with_one_bound(self, threshold, method, axis, inplace)
   7586             return self._clip_with_scalar(threshold, None, inplace=inplace)
   7587 
-> 7588         subset = method(threshold, axis=axis) | isna(self)
   7589 
   7590         # GH #15390

~/opt/anaconda3/envs/aug20/lib/python3.9/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level)
    447         axis = self._get_axis_number(axis) if axis is not None else 1
    448 
--> 449         self, other = align_method_FRAME(self, other, axis, flex=True, level=level)
    450 
    451         new_data = self._dispatch_frame_op(other, op, axis=axis)

~/opt/anaconda3/envs/aug20/lib/python3.9/site-packages/pandas/core/ops/__init__.py in align_method_FRAME(left, right, axis, flex, level)
    255     elif is_list_like(right) and not isinstance(right, (ABCSeries, ABCDataFrame)):
    256         # GH 36702. Raise when attempting arithmetic with list of array-like.
--> 257         if any(is_array_like(el) for el in right):
    258             raise ValueError(
    259                 f"Unable to coerce list of {type(right[0])} to Series/DataFrame"

~/opt/anaconda3/envs/aug20/lib/python3.9/site-packages/pint/quantity.py in __iter__(self)
    262         # as one calls iter(self) without waiting for the first element to be drawn from
    263         # the iterator
--> 264         it_magnitude = iter(self.magnitude)
    265 
    266         def it_outer():

TypeError: 'float' object is not iterable

Expected Output

Expected output is that the values of df should be constrained between limit1 and limit2.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 5f648bf
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.2. #Also tested on 1.4.0.dev0+496.g8814de9fbc with same result.
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : 6.13.14
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.9.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions