Open
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import datetime
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
s = pd.Series(pa.array([datetime.date.today(), datetime.date.today(), datetime.date.today()]), dtype='date32[pyarrow]')
df = pd.DataFrame({'c1': s, 'c2': s})
pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=False), 'dataset', ['c1'])
ret = pd.read_parquet('dataset') # exception
Issue Description
When partitioning is used, the pyarrow date32 is written to the path and read back as a dictionary of strings instead of a dictionary of date32 types (or simply date32, I was surprised dataset writing converts to a category type automatically). When trying to cast string to date32 an exception is thrown.
Expected Behavior
Something similar to this:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
s = pd.Series(pa.array([datetime.date.today(), datetime.date.today(), datetime.date.today()]), dtype='date32[pyarrow]')
df = pd.DataFrame({'c1': s, 'c2': s})
t = pa.Table.from_pandas(df, preserve_index=False)
pq.write_to_dataset(t, 'dataset', ['c1'])
dataset = pq.ParquetDataset('dataset/', schema=t.schema)
ret = dataset.read().to_pandas()
Which returns the original DataFrame
Installed Versions
pandas : 2.0.1
pyarrow : 11.0.0