
Description
@TomAugspurger, in your blog post Moral Philosophy for pandas or: What is .values? you wrote
In pandas 0.24, we'll (hopefully) have a good answer for what series.values is: a NumPy array or an ExtensionArray. For regular data types represented by NumPy, you'll get an ndarray. For extension types (implemented in pandas or elsewhere) you'll get an ExtensionArray. If you're using Series.values, you can rely on the set of methods common to each.
I just ran into an issue because this doesn't happen in some places.
pint is a popular package for working with units, useful for engineering/sci calculations. Recently they introduced an EA called pint-pandas for better interaction with pandas,
it's still a little rough, but very promising.
Here's an failure example:
In [2]: from pandas import Series
...: import pint
...: from chemtools import Ch, UREG as ureg
...: from pint import UnitRegistry
...:
...: ureg=UnitRegistry()
...: grams=ureg.g
...:
...: s=Series((1.5*grams,2.5*grams),dtype='pint[g]')
...: s.round()
<...>
AttributeError: 'numpy.ndarray' object has no attribute 'rint'
The problem is that Series.round()
calls com.values_from_object(self)
instead of the EA-aware self.values
.
Lines 2054 to 2085 in cf25c5c
com.values_from_object(self)
percolates down to ExtensionBlock.to_dense():
pandas/pandas/core/internals/blocks.py
Lines 1729 to 1730 in cf25c5c
Which calls np.asarray(self.values)
, this convert the underlying PintArray
in to a numpy array with object dtype, which is not numeric, and so numpy complains that rint
isn't implemented. If Series.round()
would invoked round()
on self.values
directly, the underlying EA implementation would get a chance to implement a suitable method, or delegate to numpy in an appropriate way.
You may want to make life easier for EA developers, by providing a default implementation for things like round(), I suppose it may work in some cases, but then there still needs to be a way for an EA to provide its own implementation of functions like round
. Many Series
methods use com.values_from_object(self)
, and its the the same problem everywhere.
cc @hgrecco (pint developer)