Skip to content

BUG: SparseArray doesn't recalculate indices after comparison with scalar #44956

Closed
@bdrum

Description

@bdrum

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

s = pd.arrays.SparseArray([1,2,3,4,0,0,0],fill_value=0)
s
#[1, 2, 3, 4, 0, 0, 0]
#Fill: 0
#IntIndex
#Indices: array([0, 1, 2, 3])
s > 2
[False, False, True, True, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0, 1, 2, 3])

s = pd.arrays.SparseArray([np.nan,2,3,4,0,0,0],fill_value=0)
pd.isna(s)
#[True, False, False, False, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0, 1, 2, 3])

Issue Description

I've been working on one issue and noticed that SparseArray doesn't recalculate sp_index when it required. E.g. on example above. Also in some specific case when fill_value is not na, but array contains na.

If you don't mind I would like to take this issue.

Expected Behavior

I think that correct behavior is SparseArray with automatically recalculated indices, e.g.

import pandas as pd
s = pd.arrays.SparseArray([1,2,3,4,0,0,0],fill_value=0)
s>2
# should be 
#[False, False, True, True, False, False, False]
#Fill: False
#IntIndex
#Indices: array([2, 3])
# and for second case 
s = pd.arrays.SparseArray([np.nan,2,3,4,0,0,0],fill_value=0)
pd.isna(s)
#[True, False, False, False, False, False, False]
#Fill: False
#IntIndex
#Indices: array([0])

Installed Versions

INSTALLED VERSIONS ------------------ commit : 47eb219 python : 3.8.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Russian_Russia.1252

pandas : 1.4.0.dev0+1415.g47eb219889
numpy : 1.21.2
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.3
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.10.1
fastparquet : None
gcsfs : 2021.10.1
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.10.1
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : 3.6.1
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions