Skip to content

Data types to support #26

Open
Open
@datapythonista

Description

@datapythonista

What data types should be part of the standard? For the array API, the types have been discussed here.

A good reference for data types for data frames is the Arrow data types documentation. The page probably contains many more types than the ones we want to support in the standard.

Topics to make decisions on:

  • Which data types should be supported by the standard?
  • Are implementation expected to provided extra data types? Should we have a list of optional types, or consider out of scope types not part of the standard?
  • Missing data is discussed separately in Missing Data #9

These are IMO the main types (feel free to disagree):

  • boolean
  • int8 / uint8
  • int16 / uint16
  • int32 / uint32
  • int64 / uint64
  • float32
  • float64
  • string (I guess the main use cases is variable length strings, but should we consider fixed length strings?)
  • categorical (would make sense to have categorical8, categorical16,... for different representations of the categories with uint8, uint16...?)
  • datetime64 (requires discussion, pandas uses nanoseconds as unit since epoch, which can represent from years 1677 to 2262)

Some other types that could be considered:

  • decimal
  • python object
  • binary
  • date
  • time
  • timedelta
  • period
  • complex

And also types based on other types that could be considered:

  • date + timezone
  • numeric + unit
  • interval
  • struct
  • list
  • mapping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions