Skip to content

ExtensionBlock.is_numeric is always False #22290

Closed
@jschendel

Description

@jschendel

Code Sample/Problem Description

Currently ExtensionBlock.is_numeric always returns False. This can be problematic for extension arrays that are numeric, as this is used under the hood in places to filter to numeric columns in a DataFrame. I'll be using IntegerArray as an example, but this in principle applies to any numeric extension array, e.g. DecimalArray in the testing suite, an extension array for units/uncertainties, etc.

Setup:

In [2]: df = pd.DataFrame({'group': list('aaabbb'),
   ...:                    'val1': IntegerArray([0, 1, 2, np.nan, 3, 4]),
   ...:                    'val2': np.arange(6)})
   ...:                    

In [3]: df
Out[3]: 
  group val1  val2
0     a    0     0
1     a    1     1
2     a    2     2
3     b  NaN     3
4     b    3     4
5     b    4     5

The IntegerArray column is ignored by DataFrame._get_numeric_data():

In [4]: df._get_numeric_data()
Out[4]: 
   val2
0     0
1     1
2     2
3     3
4     4
5     5

This leads some numeric routines, such as DataFrame.corr ignoring the IntegerArray column:

In [5]: df.corr()
Out[5]: 
      val2
val2   1.0

Likewise, groupby uses ExtensionBlock.is_numeric to filter to numeric columns for some operations, leading to the IntegerArray column being ignored, even if explicitly requested:

In [6]: df.groupby('group').sum()
Out[6]: 
       val2
group      
a         3
b        12

In [7]: df.groupby('group')['val1', 'val2'].sum()
Out[7]: 
       val2
group      
a         3
b        12

Expected Output

I'd expect ExtensionBlock.is_numeric to return True when appropriate, and for behavior to be consistent with non-extension numeric dtypes.

My first impression is that this should be an attribute of the ExtensionArray or ExtensionDtype class that defaults to False, with numeric implementations setting the attribute to True, and ExtensionBlock.is_numeric would read the value from there.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 0370740
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.29-galliumos
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+456.g0370740
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.Numeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions