Closed
Description
Greetings, Pandas devs! cuDF is building out additional dtypes such as cudf.CategoricalDtype
and cudf.ListDtype
based on pd.ExtensionDtype
, and this is one question that came up.
The documentation states:
It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.
However, I note that pd.CategoricalDtype
for instance does not adhere to this:
In [47]: import pandas as pd
In [48]: a = pd.Series(['a', 'b'], dtype='category')
In [49]: type(a[0])
Out[49]: str
In [50]: type(a.array[0])
Out[50]: str
In [51]: isinstance(a.array, pd.api.extensions.ExtensionArray)
Out[51]: True
In [52]: isinstance(a.dtype, pd.api.extensions.ExtensionDtype)
Out[52]: True
On the other hand, NumPy defines dtype.type
somewhat differently:
The type object used to instantiate a scalar of this data-type.
Would love any insights as to the appropriate return value of .type
.