Skip to content

Interchange Protocol: unused nan_as_null keyword? #125

Open
@jorisvandenbossche

Description

@jorisvandenbossche

The DataFrame Interchange Protocol has a nan_as_null keyword in __dataframe__ that can be specified by the consumer, i.e. the person/library calling this method. The docstring explains its goal:

``nan_as_null`` is a keyword intended for the consumer to tell the
producer to overwrite null values in the data with ``NaN``.
It is intended for cases where the consumer does not support the bit
mask or byte mask that is the producer's native representation.

However, at the moment I think none of the existing implementation we are aware of actually supports this keyword: pandas, vaex, modin, cudf all simply ignore it (and cudf actually uses a wrong default), and pyarrow as well but will raise an error if you pass a non-default value (i.e. True).
The following disclaimer in the docstring seems to have been copied around as some version of this appears in all the listed projects:

        `nan_as_null` currently has no effect; once support for nullable extension
        dtypes is added, this value should be propagated to columns.

I am not fully sure what this note is meant for (I suppose it originates from the pandas implementation where for nullable extension types this keyword could be implemented? Since the default numpy dtypes already use NaN as null, the keyword wouldn't have any effect for that. But for other libraries that explanation doesn't necessarily hold).
And also, if nobody implemented it, do we actually need it?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions