Open
Description
Currently, our nullable / masked extension arrays (boolean, integer, for now) are using a numpy boolean array as their _mask
to keep track of missing values. A potential route for improving memory and performance would be using a bitarray instead of a boolean numpy array (which is a byte per value).
This should require some exploration: what are options how to implement this? (existing libraries, custom implementation) What is the performance impact? (some things like masking will also be slower, since we still rely on numpy for that, which needs boolean arrays) Is this worth it to do a custom implementation rather than using pyarrow for this? etc