Skip to content

Using python comparison operators when SparseArrays of uneven length raises uninformative error #32119

Open
@dburkhardt

Description

@dburkhardt

Code Sample, a copy-pastable example if possible

I first discovered this issue attempting a comparison of the following form

Case 1

>>> import pandas as pd
>>> import numpy as np

>>> a = pd.Series(pd.SparseArray(np.arange(10)))
>>> b = pd.Series(np.arange(11))
>>> (a == 5) & (b == 5)

which raises the following uninformative AttirbuteError

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-78-bb3d3268cfcc> in <module>
----> 1 (a == 5) & (b == 5)

/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(self, other)
   1319         #   integer dtypes.  Otherwise these are boolean ops
   1320         filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
-> 1321         res_values = na_op(self.values, ovalues)
   1322         unfilled = self._constructor(res_values, index=self.index, name=res_name)
   1323         filled = filler(unfilled)

/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in na_op(x, y)
   1252     def na_op(x, y):
   1253         try:
-> 1254             result = op(x, y)
   1255         except TypeError:
   1256             assert not isinstance(y, (list, ABCSeries, ABCIndexClass))

/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in cmp_method(self, other)
   1821 
   1822             if isinstance(other, SparseArray):
-> 1823                 return _sparse_array_op(self, other, op, op_name)
   1824             else:
   1825                 with np.errstate(all="ignore"):

/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in _sparse_array_op(left, right, op, name)
    493             right_sp_values = right.sp_values
    494 
--> 495         sparse_op = getattr(splib, opname)
    496 
    497         with np.errstate(all="ignore"):

AttributeError: module 'pandas._libs.sparse' has no attribute 'sparse_and_object'

Note, if a and b are of the same length, this code runs fine:
Case 2

a = pd.Series(pd.SparseArray(np.arange(10)))
b = pd.Series(np.arange(10))

(a == 5) & (b == 5)

returns

0    False
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8    False
9    False
dtype: Sparse[bool, False]

and if they are both non-sparse, the code evaluates fine.
Case 3

a = pd.Series(np.arange(10))
b = pd.Series(np.arange(11))

(a == 5) & (b == 5)
0     False
1     False
2     False
3     False
4     False
5      True
6     False
7     False
8     False
9     False
10    False
dtype: bool

Expected Output

I expect one of two behaviors:

  1. An error message that states the two arrays must be of even length in Case 1
  2. The code in Case 1 to return the same output as in Case 3

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.7-arch1-1
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 42.0.2
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: 0.8.1
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.0.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasNumeric OperationsArithmetic, Comparison, and Logical operationsSparseSparse Data Type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions