Skip to content

[protocol] Clarify the exact meaning of the buffer's dtype (Dtype tuple in get_buffers()) #273

Open
@jorisvandenbossche

Description

@jorisvandenbossche

If my assumption is incorrect

Just as a sanity check I checked and pandas/vaex/cuDF/modin all return the type described in Column.dtype for the get_buffers() values. A. TODO I realize for dataframe-interchange-tests is generalise test_dtype and use it in test_get_buffers.

Note that this is actually not correct, depending on how you interpret it. Yes, the buffers' dtype returns a similar type of DType tuple, but it should not necessarily return the same dtype tuple as its Column.dtype does, as the buffer can have a different dtype than the column.

It seems that we all interpreted this wrongly and all implementations got this wrong (or the text about "the data buffer's associated dtype" is wrong), see apache/arrow#37598, pandas-dev/pandas#54781, pola-rs/polars#10787 (and the same for StaticFrame mentioned above, from a quick look).

Originally posted by @jorisvandenbossche in #87 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions