Description
We have #56587 and #59518 now for exporting pandas DataFrame and Series through the Arrow PyCapsule Interface (i.e. adding __arrow_c_stream__
methods), but we don't yet have the import counterpart.
For importing, the specification doesn't provide any API guidelines on what this should look like, so we have a couple of options. The two main ones I can think of:
- Add a dedicated
from_arrow()
method, which could be top level (pd.from_arrow(..)
) or as class methods (pd.DataFrame.from_arrow(..)
) - Support such objects directly in the main constructors (
pd.Dataframe(..)
)
In pandas itself, we do have a couple of from_..
class methods (from_dict
/from_records
), but often for objects we also allow in the main constructor (at least for the dict case), but I think the main differentiator is that the specific class methods then have more specialized keyword arguments (and therefore allow a larger variety of input).
So based on that pattern, we could also do both: add a DataFrame.from_arrow()
class method, and then also accept such objects in pd.DataFrame()
, passing through to from_arrow()
(which could have more custom options to control how the conversion from arrow to pandas exactly is done).
Looking at polars, it seems they also have both, but I am not entirely sure about the connection between both. pl.from_arrow
already existed but might be more specific for pyarrow? And then pola-rs/polars#17693 added it to the main pl.DataFrame(..)
constructor (@kylebarron)
For geopandas, I added a GeoDataFrame.from_arrow()
method.
(to be clear, everything said above also applies to Series()
/ Series.from_arrow()
etc)