Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
$ import pandas as pd
$ import pyarrow as pa
$ pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[2], line 4
1 import pandas as pd
2 import pyarrow as pa
----> 4 pd.api.types.pandas_dtype(pd.ArrowDtype(pa.json_(pa.string()))).type
File ~/src/bigframes/venv/lib/python3.12/site-packages/pandas/core/dtypes/dtypes.py:2169, in ArrowDtype.type(self)
2167 elif isinstance(pa_type, pa.ExtensionType):
2168 return type(self)(pa_type.storage_type).type
-> 2169 raise NotImplementedError(pa_type)
NotImplementedError: extension<arrow.json>
Issue Description
Apache Arrow v19.0 introduced the pa.json_
extension type (doc). Currently, pandas.ArrowDtype.type does not correctly handle this new type.
The ArrowDtype.type
method is crucial for various pandas dtype APIs, including pd.api.types.pandas_dtype()
and pd.api.types.is_timedelta64_dtype()
. When used with pd.ArrowDtype(pa.json_(pa.string()))
, these APIs produce unexpected results.
The issue is that the pandas ArrowDtype.type
method should return the underlying storage type of the arrow json type.
Expected Behavior
pa.json_
is a standard Arrow extension type. ArrowDtype.type
should accurately return its storage type, mirroring the behavior of other Arrow extension types.
Specifically, pd.ArrowDtype(pa.json_(pa.string())).type
should reflect the storage type, which is pa.string()
as shown below.
Codes to show arrow storage type for pa.json_
:
$ import pyarrow as pa
$ pa.json_(pa.string()).storage_type
DataType(string)
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.12.1
python-bits : 64
OS : Linux
OS-release : 6.10.11-1rodete2-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1rodete2 (2024-10-16)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 2.2.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 23.2.1
Cython : None
sphinx : None
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : None
gcsfs : 2025.2.0
jinja2 : None
lxml.etree : None
matplotlib : 3.10.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : 0.27.0
psycopg2 : None
pymysql : None
pyarrow : 19.0.0
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.15.1
sqlalchemy : 2.0.38
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None