Skip to content

BUG: pd.NA is not compatible with searchsorted #30944

Open
@jschendel

Description

@jschendel

Code Sample, a copy-pastable example if possible

On master trying to use pd.NA as an input to searchsorted fails, and trying to use the searchsorted of an array containing pd.NA also fails:

In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '1.0.0rc0+15.g4e2546d89'

In [2]: s = pd.Series([-1, 1, 3, 5])

In [3]: arr_pd_na = pd.array([0, 1, 2, pd.NA])

In [4]: s.searchsorted(arr_pd_na)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous

In [5]: s.searchsorted(pd.NA)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous

In [6]: arr_pd_na.searchsorted(10)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous

Note that the np.nan equivalent works fine:

In [7]: arr_np_nan = np.array([0, 1, 2, np.nan])

In [8]: s.searchsorted(arr_np_nan)
Out[8]: array([1, 1, 2, 4])

In [9]: s.searchsorted(np.nan)
Out[9]: 4

In [10]: arr_np_nan.searchsorted(10)
Out[10]: 3

This has downstream effects on anything that relies on searchsorted, e.g. pd.cut, which has the same failing behavior as above for pd.NA but succeeds for np.nan:

In [11]: pd.cut(arr_pd_na, bins=3)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous

In [12]: pd.cut(arr_np_nan, bins=3)
Out[12]: 
[(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0], NaN]
Categories (3, interval[float64]): [(-0.002, 0.667] < (0.667, 1.333] < (1.333, 2.0]]

Problem description

pd.NA is not compatible with searchsorted.

Expected Output

I'd expect the output for the pd.NA operations above to match the output of the equivalent np.nan operations.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 4e2546d
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.14-041914-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.0rc0+15.g4e2546d89
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.0
hypothesis : 4.36.2
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
pytest : 5.2.0
s3fs : 0.3.4
scipy : 1.3.1
sqlalchemy : 1.3.8
tables : 3.5.1
tabulate : None
xarray : 0.13.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
numba : 0.46.0

Metadata

Metadata

Assignees

Labels

BugExtensionArrayExtending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arrays

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions