Open
Description
What data types should be part of the standard? For the array API, the types have been discussed here.
A good reference for data types for data frames is the Arrow data types documentation. The page probably contains many more types than the ones we want to support in the standard.
Topics to make decisions on:
- Which data types should be supported by the standard?
- Are implementation expected to provided extra data types? Should we have a list of optional types, or consider out of scope types not part of the standard?
- Missing data is discussed separately in Missing Data #9
These are IMO the main types (feel free to disagree):
- boolean
- int8 / uint8
- int16 / uint16
- int32 / uint32
- int64 / uint64
- float32
- float64
- string (I guess the main use cases is variable length strings, but should we consider fixed length strings?)
- categorical (would make sense to have categorical8, categorical16,... for different representations of the categories with uint8, uint16...?)
- datetime64 (requires discussion, pandas uses nanoseconds as unit since epoch, which can represent from years 1677 to 2262)
Some other types that could be considered:
- decimal
- python object
- binary
- date
- time
- timedelta
- period
- complex
And also types based on other types that could be considered:
- date + timezone
- numeric + unit
- interval
- struct
- list
- mapping
Metadata
Metadata
Assignees
Labels
No labels