Skip to content

Allow to reconstruct a library-specific DataFrame object from an interchange object #85

Open
@jorisvandenbossche

Description

@jorisvandenbossche

From the discussions at EuroScipy with scikit-learn developers (cc @ogrisel), the following use case came to mind: assume you have a method that transforms your data, a workflow could be:

  1. accept any dataframe library object as input
  2. using the interchange protocol, robustly access the buffer for a column (eg as a numpy array) and transform the array
  3. reconstruct a dataframe object (same type as the input) as a return value
    1. put the transformed array back into (a copy of) the interchange protocol object, or construct a new protocol object from scratch
    2. given an interchange protocol object, create a library dataframe object (so calling from_dataframe from the input object's library)

This last step is currently not possible, because you don't (want to) know each possible library that implements __dataframe__ and where its from_dataframe lives.

This is very much related with a possible "namespace" like the array api uses (cfr #79).
With that this could look like:

df_obj = df.__dataframe__()
... # transform data in df_obj
df_ns = df.__dataframe_namespace__()
return df_ns.from_dataframe(df_obj_transformed)

But we could also think about (shorter-term) alternatives directly tied to the interchange protocol object. For example, we could have a class method or attribute that points to the from_dataframe method of the library that created the object.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions