Closed
Description
This bug is not present in Pandas < 1.3.0.
In 1.3.0, calling Series.isin()
will fail if
- the
Series
dtype
is an extension dtype (pd.Float64Dtype()
,pd.Int64Dtype()
, ...) - the
Series
contains any 'missing' values (numpy.nan
,pd.na
)
The following code snippet tests a few dtypes
, determining if each of them supports isin
with missing values:
import pandas as pd
import numpy as np
for dtype in (float, int, pd.Float64Dtype(), pd.Int64Dtype(), object):
x = pd.Series([0, 1, 2, 3, 4], dtype=dtype)
options = [1, 2, 3]
print(f"\nTesting with dtype = {x.dtype}:")
x.isin(options) # This works everytime - no missing values
x.iloc[1] = np.nan # Set a value to NA
try:
x.isin(options) # This no longer works
except Exception as err:
print(f"Error! {err}")
else:
print("OK")
# Now, show the actual stack trace
print("\nStacktrace for dtype=Int64")
dtype = pd.Int64Dtype()
x = pd.Series([0, 1, 2, 3, 4], dtype=dtype)
options = [1, 2, 3]
x.iloc[1] = np.nan # Set a value to NA
x.isin(options)
The output is:
Testing with dtype = float64:
OK
Testing with dtype = int64:
OK
Testing with dtype = Float64:
Error! boolean value of NA is ambiguous
Testing with dtype = Int64:
Error! boolean value of NA is ambiguous
Testing with dtype = object:
OK
Stacktrace for dtype=Int64
Traceback (most recent call last):
File "...dev/pd_1_3_isin_bug.py", line 31, in <module>
x.isin(options)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/series.py", line 5024, in isin
result = algorithms.isin(self._values, values)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/algorithms.py", line 475, in isin
return comps.isin(values)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/arrays/masked.py", line 408, in isin
if libmissing.NA in values:
File "pandas/_libs/missing.pyx", line 446, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit : f00ed8f47020034e752baf0250483053340971b0
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.13.1.el7.x86_64
Version : #1 SMP Fri Jun 12 14:34:17 EDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.20.3
pytz : 2019.3
dateutil : 2.8.0
pip : 21.0.1
setuptools : 40.8.0
Cython : 0.29.13
pytest : 5.1.1
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fsspec : 0.5.2
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 0.13.0
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.9
tables : 3.5.2
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.45.1