Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.Series(["abc"], dtype="large_string[pyarrow]").str.split("b").str
-traceback
Traceback (most recent call last):
File "<python-input-7>", line 1, in <module>
a = pd.Series(["abc"], dtype="large_string[pyarrow]").str.split("b").str[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/generic.py", line 6127, in __getattr__
return object.__getattribute__(self, name)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/accessor.py", line 228, in __get__
return self._accessor(obj)
~~~~~~~~~~~~~~^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/strings/accessor.py", line 208, in __init__
self._inferred_dtype = self._validate(data)
~~~~~~~~~~~~~~^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/strings/accessor.py", line 262, in _validate
raise AttributeError(
f"Can only use .str accessor with string values, not {inferred_dtype}"
)
AttributeError: Can only use .str accessor with string values, not unknown-array. Did you mean: 'std'?
Issue Description
The return dtype of split
is very different when acting on large_string
(results in pyarrow list) and string
(results in object).
Interestingly, using the list
accessor works only on large_string
dtype
>>> pd.Series(["abc"], dtype="large_string[pyarrow]").str.split("b").list[0]
0 a
dtype: large_string[pyarrow]
but not on string
dtype
>>> pd.Series(["abc"], dtype="string[pyarrow]").str.split("b").list[0]
Traceback (most recent call last):
File "<python-input-15>", line 1, in <module>
pd.Series(["abc"], dtype="string[pyarrow]").str.split("b").list[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/generic.py", line 6127, in __getattr__
return object.__getattribute__(self, name)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/accessor.py", line 228, in __get__
return self._accessor(obj)
~~~~~~~~~~~~~~^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/arrays/arrow/accessors.py", line 73, in __init__
super().__init__(
~~~~~~~~~~~~~~~~^
data,
^^^^^
validation_msg="Can only use the '.list' accessor with "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"'list[pyarrow]' dtype, not {dtype}.",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/arrays/arrow/accessors.py", line 41, in __init__
self._validate(data)
~~~~~~~~~~~~~~^^^^^^
File "/opt/homebrew/Caskroom/miniconda/base/envs/pandas-main-string-test/lib/python3.13/site-packages/pandas/core/arrays/arrow/accessors.py", line 51, in _validate
raise AttributeError(self._validation_msg.format(dtype=dtype))
AttributeError: Can only use the '.list' accessor with 'list[pyarrow]' dtype, not object.. Did you mean: 'hist'?
From a use perspective this is unfortunate, as I have to know the underlying dtype in order to choose the correct accessor (or cast).
Expected Behavior
Should work similar to
>>> pd.Series(["abc"], dtype="string[pyarrow]").str.split("b").str[0]
0 a
dtype: object
since it is documented behavior
pandas/doc/source/user_guide/text.rst
Line 229 in f496acf
Installed Versions
INSTALLED VERSIONS
commit : f496acf
python : 3.13.2
python-bits : 64
OS : Darwin
OS-release : 24.4.0
Version : Darwin Kernel Version 24.4.0: Fri Apr 11 18:33:47 PDT 2025; root:xnu-11417.101.15~117/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 3.0.0.dev0+2100.gf496acffcc
numpy : 2.2.5
dateutil : 2.9.0.post0
pip : 25.1
Cython : 3.0.11
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 20.0.0
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None