Skip to content

DataFrame interchange protocol: should NaT be like NaN or a sentinel? #64

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

In the describe_null we currently list the following options:

  • 0 : non-nullable
  • 1 : NaN/NaT
  • 2 : sentinel value
  • 3 : bit mask
  • 4 : byte mask

While looking at the pandas implementation, I was wondering if we shouldn't treat NaT differently from NaN and see it as a sentinel value (option 2 in the list above).

While NaN could also be seen as a kind of sentinel value, there are some clear differences: NaN is a floating point concept backed by the IEEE754 standard (while as far as I know "NaT" is quite numpy specific? eg Arrow doesn't support it). NaNs also evaluate as non-equal (following the standard), and while for datetime64 with NaT that's also the case in numpy, if you view the data as int64 it's not (and eg for dlpack those values will be regarded as int64? And the actual Buffer object might be agnostic to it)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions