Closed
Description
This issue supersedes #1 and #14. As agreed in 6 Aug call the first milestone in the definition of the dataframe API will be the part to interchange data. As an sample use case, Matplotlib being able to receive dataframes from different implementations (e.g. pandas, vaex, modin, cudf, etc.).
This work was originally discussed in OSSData, and an initial draft was later proposed here: wesm/dataframe-protocol#1.
The topics to discuss and decide on are next:
- Access dataframe properties/metadata:
- Get number of rows and columns Get number of rows and columns #20
- Get column names Get and set column names #21
- Get column data types
- Selecting a column and accessing the underlying data (I think it'll require a decision on Separate object for a dataframe colum? (is Series needed?) #6, whether we want a separate column/Series object)
- Data types (which are part of the standard, how they are represented, etc.) Data types to support #26
- How downstream libraries will access this API (implemented in the dataframe directly, or returned object via
__dataframe__
) - Should row labels be part of the standard?
The procedure to include this part on the standard RFC will be as follows:
- Define the goals, requirements, target audience, scope and use cases, and include them in the RFC
- Discuss and build an standalone document specific to the data interchange based on the above topics
- Review internally, and post publicly for additional feedback
- Update the prototype with the agreed API
- Finalize and approve the API and the prototype, and add them to the RFC document