Open
Description
From the discussions at EuroScipy with scikit-learn developers (cc @ogrisel), the following use case came to mind: assume you have a method that transforms your data, a workflow could be:
- accept any dataframe library object as input
- using the interchange protocol, robustly access the buffer for a column (eg as a numpy array) and transform the array
- reconstruct a dataframe object (same type as the input) as a return value
- put the transformed array back into (a copy of) the interchange protocol object, or construct a new protocol object from scratch
- given an interchange protocol object, create a library dataframe object (so calling
from_dataframe
from the input object's library)
This last step is currently not possible, because you don't (want to) know each possible library that implements __dataframe__
and where its from_dataframe
lives.
This is very much related with a possible "namespace" like the array api uses (cfr #79).
With that this could look like:
df_obj = df.__dataframe__()
... # transform data in df_obj
df_ns = df.__dataframe_namespace__()
return df_ns.from_dataframe(df_obj_transformed)
But we could also think about (shorter-term) alternatives directly tied to the interchange protocol object. For example, we could have a class method or attribute that points to the from_dataframe
method of the library that created the object.