Closed
Description
Now we start to have mask-based dtypes/arrays (integer, boolean), we should also look into making our algos work with such masked arrays. An example for which we could explore this is factorize
/ unique
.
Currently, BooleanArray and IntegerArray need to convert their masked array into a single numpy array using a certain "NA sentinel" that is specified so the algo can recognize this sentinel. This happens through the ExtensionArray._values_for_factorize
, which returns a (numpy array, NA sentinel) tuple.
In practice this means that the boolean array is converted to integer (with NA as -1), and IntegerArray is converted to float array with NA as NaN, so the algos can handle this.
We should look into:
- Can we adapt or make a specific version of the unique/factorize hashtable class that takes a mask instead of a NA sentinel
- We could then have a variant of
ExtensionArray._values_for_factorize
that then returns (array, mask) instead of (array, NA).