Open
Description
Instead of having individual methods to query the DType, categorical description, null description and metadata (which I suspect might be replicated at the DataFrame level?), how about adding a first-class abstraction to tie them together? For example:
class ColumnSchema(TypedDict):
# the underlying physical representation
dtype: DType
# if the column is categorical, describes how to interpret the contents
categorical_encoding: Optional[CategoricalDescription]
# if the column supports null values, describes how they are represented
null_encoding: Optional[Tuple[ColumnNullType, Any]]
# arbitrary metadata attached to the column, possibly empty
metadata: Dict[str, Any]
class Column(ABC):
...
@property
@abstractmethod
def schema(self) -> ColumnSchema: ...
(IMHO, "encoding" sounds more precise than "description")
I'm also not sure why the spec uses a mix of Tuples and TypedDicts. Is it an attempt at optimizing Python object footprint?