Description
Code Sample, a copy-pastable example if possible
I first discovered this issue attempting a comparison of the following form
Case 1
>>> import pandas as pd
>>> import numpy as np
>>> a = pd.Series(pd.SparseArray(np.arange(10)))
>>> b = pd.Series(np.arange(11))
>>> (a == 5) & (b == 5)
which raises the following uninformative AttirbuteError
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-78-bb3d3268cfcc> in <module>
----> 1 (a == 5) & (b == 5)
/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(self, other)
1319 # integer dtypes. Otherwise these are boolean ops
1320 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
-> 1321 res_values = na_op(self.values, ovalues)
1322 unfilled = self._constructor(res_values, index=self.index, name=res_name)
1323 filled = filler(unfilled)
/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in na_op(x, y)
1252 def na_op(x, y):
1253 try:
-> 1254 result = op(x, y)
1255 except TypeError:
1256 assert not isinstance(y, (list, ABCSeries, ABCIndexClass))
/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in cmp_method(self, other)
1821
1822 if isinstance(other, SparseArray):
-> 1823 return _sparse_array_op(self, other, op, op_name)
1824 else:
1825 with np.errstate(all="ignore"):
/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in _sparse_array_op(left, right, op, name)
493 right_sp_values = right.sp_values
494
--> 495 sparse_op = getattr(splib, opname)
496
497 with np.errstate(all="ignore"):
AttributeError: module 'pandas._libs.sparse' has no attribute 'sparse_and_object'
Note, if a
and b
are of the same length, this code runs fine:
Case 2
a = pd.Series(pd.SparseArray(np.arange(10)))
b = pd.Series(np.arange(10))
(a == 5) & (b == 5)
returns
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
dtype: Sparse[bool, False]
and if they are both non-sparse, the code evaluates fine.
Case 3
a = pd.Series(np.arange(10))
b = pd.Series(np.arange(11))
(a == 5) & (b == 5)
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 False
dtype: bool
Expected Output
I expect one of two behaviors:
- An error message that states the two arrays must be of even length in Case 1
- The code in Case 1 to return the same output as in Case 3
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.7-arch1-1
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 42.0.2
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: 0.8.1
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.0.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None