Open
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series(["foo", "bar", ""])
print(f"{s.any() = }")
print(f"{s.astype('string[python]').any() = }")
# Also fails with pyarrow strings
# print(f"{s.astype('string[pyarrow]').any() = }")
# Similar behavior with `.all()`
Issue Description
When using object
dtypes for string data, .any()
and .all()
reductions both work. When using string[python]
or string[pyarrow]
both these operations raise a TypeError
`string[python]` traceback:
Traceback (most recent call last):
File "/Users/james/projects/dask/dask/test-any.py", line 5, in <module>
print(f"{s.astype('string[python]').any() = }")
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11368, in any
return NDFrame.any(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11056, in any
return self._logical_func(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11040, in _logical_func
return self._reduce(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/series.py", line 4522, in _reduce
return delegate._reduce(name, skipna=skipna, **kwds)
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/arrays/string_.py", line 474, in _reduce
raise TypeError(f"Cannot perform reduction '{name}' with string dtype")
TypeError: Cannot perform reduction 'any' with string dtype
`string[pyarrow]` traceback:
Traceback (most recent call last):
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/arrays/arrow/array.py", line 1281, in _reduce
result = pyarrow_meth(data_to_reduce, skip_nulls=skipna, **kwargs)
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pyarrow/compute.py", line 256, in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'any' has no kernel matching input types (string)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/james/projects/dask/dask/test-any.py", line 5, in <module>
print(f"{s.astype('string[pyarrow]').any() = }")
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11368, in any
return NDFrame.any(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11056, in any
return self._logical_func(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/generic.py", line 11040, in _logical_func
return self._reduce(
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/series.py", line 4522, in _reduce
return delegate._reduce(name, skipna=skipna, **kwds)
File "/Users/james/mambaforge/envs/dask/lib/python3.10/site-packages/pandas/core/arrays/arrow/array.py", line 1289, in _reduce
raise TypeError(msg) from err
TypeError: 'ArrowStringArray' with dtype string does not support reduction 'any' with pyarrow version 11.0.0. 'any' may be supported by upgrading pyarrow.
Expected Behavior
I'd expect string[pyarrow]
and string[python]
to also treat empty strings as falsey and non-empty strings as truthy when computing .any()
and .all()
cc @phofl for visibility
Installed Versions
INSTALLED VERSIONS
------------------
commit : 6169cba72dbe8c7e9c7f17ab38af15a256f083da
python : 3.10.4.final.0
python-bits : 64
OS : Darwin
OS-release : 22.3.0
Version : Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.1.0.dev0+186.g6169cba72d
numpy : 1.25.0.dev0+882.gea0b1708b
pytz : 2022.1
dateutil : 2.8.2
setuptools : 59.8.0
pip : 22.0.4
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : 2023.2.0
fsspec : 2023.1.0+5.g012816b
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : 2.8.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.10.0
scipy : 1.9.0
snappy :
sqlalchemy : 1.4.35
tables : 3.7.0
tabulate : None
xarray : 2022.3.0
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None