Description
I think it's useful to think through concrete use cases on how the interchange protocol could be used, to see if it covers those use cases / the desired APIs are available.
One example use case could be matplotlib's plot("x", "y", data=obj)
, where matplotlib already supports getting the x and y column of any "indexable" object. Currently they require obj["x"]
to give the desired data, but so in theory this support could be extended to any object that supports the dataframe interchange protocol. But at the same time, matplotlib currently also needs those data (AFAIK) as numpy arrays because the low-level plotting code is implemented in such a way.
With the current API, matplotlib could do something like:
df = obj.__dataframe__()
x_values = some_utility_func(df.get_column_by_name("x").get_buffers())
where some_utility_func
can convert the dict of Buffer
objects to a numpy array (once numpy supports dlpack, converting the Buffer objects to numpy will become easy, but the function will then still need to handle potentially multiple buffers returned from get_buffers()
).
That doesn't seem ideal: 1) writing the some_utility_func
to do the conversion to numpy is non-trivial to implement for all different cases, 2) should an end-user library have to go down to the Buffer objects?
This isn't a pure interchange from one dataframe library to another, so we could also say that this use case is out-of-scope at the moment. But on the other hand, it seems a typical use case example, and could in theory already be supported right now (it only needs the "dataframe api" to get a column, which is one of the few things we already provide).
(disclaimer: I am not a matplotlib developer, I also don't know if they for example have efforts to add support for generic array-likes (but it's nonetheless a typical example use case, I think))