Signature for a standard `from_dataframe` constructor function

One of the "to be decided" items at https://github.com/data-apis/dataframe-api/blob/dataframe-interchange-protocol/protocol/dataframe_protocol_summary.md#to-be-decided is:

_**Should there be a standard from_dataframe constructor function?** This isn't completely necessary, however it's expected that a full dataframe API standard will have such a function. The array API standard also has such a function, namely from_dlpack. Adding at least a recommendation on syntax for this function would make sense, e.g., from_dataframe(df, stream=None). Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651 is relevant._

In the announcement blog post draft I tentatively answered that with "yes", and added an example. The question is what the desired signature should be. The Pandas prototype currently has the most basic signature one can think of:

```python
def from_dataframe(df : DataFrameObject) -> pd.DataFrame:
    """
    Construct a pandas DataFrame from ``df`` if it supports ``__dataframe__``
    """
    if isinstance(df, pd.DataFrame):
        return df

    if not hasattr(df, '__dataframe__'):
        raise ValueError("`df` does not support __dataframe__")

    return _from_dataframe(df.__dataframe__())
```

The above just takes any dataframe supporting the protocol, and turns the whole things in the "library-native" dataframe. Now of course, it's possible to add functionality to it, to extract only a subset of the data. Most obviously, named columns:

```python
def from_dataframe(df : DataFrameObject, *, colnames : Optional[Iterable[str]]= None) -> pd.DataFrame:
```

Other things we may or may not want to support: 
- columns by index
- get a subset of chunks

My personal feeling is:
- columns by index: maybe, and if we do then with a separate keyword like `col_indices=None`
- a subset of chunks: probably not. This is more advanced usage, and if one needs it it's likely one wants to get the object returned by `__dataframe__` first, then inspect some metadata, and only then decide what chunks to get.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signature for a standard `from_dataframe` constructor function #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Signature for a standard from_dataframe constructor function #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Signature for a standard `from_dataframe` constructor function #42