Skip to content

Series.round behavior with ExtensionArrays #26730

Closed
@ghost

Description

@TomAugspurger, in your blog post Moral Philosophy for pandas or: What is .values? you wrote

In pandas 0.24, we'll (hopefully) have a good answer for what series.values is: a NumPy array or an ExtensionArray. For regular data types represented by NumPy, you'll get an ndarray. For extension types (implemented in pandas or elsewhere) you'll get an ExtensionArray. If you're using Series.values, you can rely on the set of methods common to each.

I just ran into an issue because this doesn't happen in some places.

pint is a popular package for working with units, useful for engineering/sci calculations. Recently they introduced an EA called pint-pandas for better interaction with pandas,
it's still a little rough, but very promising.

Here's an failure example:

In [2]: from pandas import Series 
   ...: import pint
   ...: from chemtools import Ch, UREG as ureg
   ...: from pint import UnitRegistry
   ...: 
   ...: ureg=UnitRegistry()
   ...: grams=ureg.g
   ...: 
   ...: s=Series((1.5*grams,2.5*grams),dtype='pint[g]')
   ...: s.round()
<...>
AttributeError: 'numpy.ndarray' object has no attribute 'rint'

The problem is that Series.round() calls com.values_from_object(self) instead of the EA-aware self.values.

pandas/pandas/core/series.py

Lines 2054 to 2085 in cf25c5c

def round(self, decimals=0, *args, **kwargs):
"""
Round each value in a Series to the given number of decimals.
Parameters
----------
decimals : int
Number of decimal places to round to (default: 0).
If decimals is negative, it specifies the number of
positions to the left of the decimal point.
Returns
-------
Series
Rounded values of the Series.
See Also
--------
numpy.around : Round values of an np.array.
DataFrame.round : Round values of a DataFrame.
Examples
--------
>>> s = pd.Series([0.1, 1.3, 2.7])
>>> s.round()
0 0.0
1 1.0
2 3.0
dtype: float64
"""
nv.validate_round(args, kwargs)
result = com.values_from_object(self).round(decimals)

com.values_from_object(self) percolates down to ExtensionBlock.to_dense():

def to_dense(self):
return np.asarray(self.values)

Which calls np.asarray(self.values), this convert the underlying PintArray in to a numpy array with object dtype, which is not numeric, and so numpy complains that rint isn't implemented. If Series.round() would invoked round() on self.values directly, the underlying EA implementation would get a chance to implement a suitable method, or delegate to numpy in an appropriate way.

You may want to make life easier for EA developers, by providing a default implementation for things like round(), I suppose it may work in some cases, but then there still needs to be a way for an EA to provide its own implementation of functions like round. Many Series methods use com.values_from_object(self), and its the the same problem everywhere.

cc @hgrecco (pint developer)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions