Closed
Description
As far as I could see, there is no easy way given a PyArrow table, to get a DataFrame with pyarrow types.
I'd expect that those idioms work:
import numpy
import pyarrow
import pandas
arrow_u8 = pyarrow.array([1, 2, 3], type=pyarrow.uint8())
arrow_f64 = pyarrow.array([1., 2., 3.], type=pyarrow.float64())
table = pyarrow.table([arrow_u8, arrow_f64], names=['u8', 'f64'])
# Using the PyArrow `to_pandas` method will use NumPy backed data
df = table.to_pandas()
# Using the constructor with a PyArrow table raises: ValueError: DataFrame constructor not properly called!
df = pandas.DataFrame(table)
# This is not implemented (the method doesn't exist)
df = pandas.DataFrame.from_arrow(table)
# Creating a dataframe column by column naively from the arrow array will use NumPy dtypes
df = pandas.DataFrame({'u8': arrow_u8,
'f64': arrow_f64})
I think the easier way to make the transition is with something like this:
df = pandas.DataFrame({name: pandas.Series(array,
dtype=pandas.ArrowDtype(array.type))
for array, name
in zip(table.columns, table.column_names)})
@pandas-dev/pandas-core Given that Arrow dtypes is one of the highlights of pandas 2.0, shouldn't we provide at least one easy way to convert before the release?