Description
Code Sample, a copy-pastable example if possible
On master
trying to use pd.NA
as an input to searchsorted
fails, and trying to use the searchsorted
of an array containing pd.NA
also fails:
In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '1.0.0rc0+15.g4e2546d89'
In [2]: s = pd.Series([-1, 1, 3, 5])
In [3]: arr_pd_na = pd.array([0, 1, 2, pd.NA])
In [4]: s.searchsorted(arr_pd_na)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous
In [5]: s.searchsorted(pd.NA)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous
In [6]: arr_pd_na.searchsorted(10)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous
Note that the np.nan
equivalent works fine:
In [7]: arr_np_nan = np.array([0, 1, 2, np.nan])
In [8]: s.searchsorted(arr_np_nan)
Out[8]: array([1, 1, 2, 4])
In [9]: s.searchsorted(np.nan)
Out[9]: 4
In [10]: arr_np_nan.searchsorted(10)
Out[10]: 3
This has downstream effects on anything that relies on searchsorted
, e.g. pd.cut
, which has the same failing behavior as above for pd.NA
but succeeds for np.nan
:
In [11]: pd.cut(arr_pd_na, bins=3)
---------------------------------------------------------------------------
TypeError: boolean value of NA is ambiguous
In [12]: pd.cut(arr_np_nan, bins=3)
Out[12]:
[(-0.002, 0.667], (0.667, 1.333], (1.333, 2.0], NaN]
Categories (3, interval[float64]): [(-0.002, 0.667] < (0.667, 1.333] < (1.333, 2.0]]
Problem description
pd.NA
is not compatible with searchsorted
.
Expected Output
I'd expect the output for the pd.NA
operations above to match the output of the equivalent np.nan
operations.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 4e2546d
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.14-041914-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.0rc0+15.g4e2546d89
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.0
hypothesis : 4.36.2
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
pytest : 5.2.0
s3fs : 0.3.4
scipy : 1.3.1
sqlalchemy : 1.3.8
tables : 3.5.1
tabulate : None
xarray : 0.13.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
numba : 0.46.0