Skip to content

BUG: Wrong result after comparison SparseArray with arrays #45284

Open
@bdrum

Description

@bdrum

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s > [3,3,4,1,0,0]
# [False, False, False, True, False, False]
# Fill: False
# IntIndex
# Indices: array([0, 1, 2, 3, 4, 5], dtype=int32)

Issue Description

This issue is the consequence of global problem described in #45126.

The other part of the issue (comparison with scalars) already solved and merged: #44956, #45110.

I've separated this issue due to discussion in #45125.

In the mentioned pr I've also added two test and marked their as xfail in tests/extension/test_sparse.py:
TestComparisonOps::test_array
TestComparisonOps::test_sparse_array

Please pay attention that now boolean mask for SparseArray based on indices, i.e. such construction:

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s[s > [3,3,4,1,0,0]]

will return wrong result:

[1.0, 2.0, 3.0, 4.0, nan, nan]
Fill: nan
IntIndex
Indices: array([0, 1, 2, 3], dtype=int32)

instead of

[4.0]
Fill: nan
IntIndex
Indices: array([0], dtype=int32)

Expected Behavior

s = pd.arrays.SparseArray([1, 2, 3, 4, np.nan, np.nan], fill_value=np.nan)
s > [3,3,4,1,0,0]
# [False, False, False, True, False, False]
# Fill: False
# IntIndex
# Indices: array([3], dtype=int32)

Installed Versions

INSTALLED VERSIONS

commit : ccb25ab
python : 3.8.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.60.1-microsoft-standard-WSL2
Version : #1 SMP Wed Aug 25 23:20:18 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+1613.gccb25ab1d2
numpy : 1.22.0
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.0.4
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.34.2
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : None
gcsfs : 2021.11.0
matplotlib : 3.5.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.3
sqlalchemy : 1.4.29
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
zstandard : None

Metadata

Metadata

Assignees

Labels

BugNumeric OperationsArithmetic, Comparison, and Logical operationsSparseSparse Data Type

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions