Closed
Description
Code Sample, a copy-pastable example if possible
>>> pd.DataFrame([[1, pd.NA], [2, "a"]]).drop_duplicates()
Traceback (most recent call last):
...
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/frame.py", line 4859, in f
labels, shape = algorithms.factorize(
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 629, in factorize
codes, uniques = _factorize_array(
File "/Users/williamayd/miniconda3/envs/sitka/lib/python3.8/site-packages/pandas/core/algorithms.py", line 478, in _factorize_array
uniques, codes = table.factorize(values, na_sentinel=na_sentinel, na_value=na_value)
File "pandas/_libs/hashtable_class_helper.pxi", line 1806, in pandas._libs.hashtable.PyObjectHashTable.factorize
File "pandas/_libs/hashtable_class_helper.pxi", line 1728, in pandas._libs.hashtable.PyObjectHashTable._unique
File "pandas/_libs/missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
This same failure isn't present when using an extension type:
>>> df = pd.DataFrame([[1, pd.NA], [2, "a"]], columns=list("ab"))
>>> df["b"] = df["b"].astype("string")
>>> df.drop_duplicates()
a b
0 1 <NA>
1 2 a