Skip to content

ExtensionArray support for comparisons to fundamental types and lists of those types #20659

Closed
@Dr-Irv

Description

@Dr-Irv

This is probably for @TomAugspurger to look at.

I am trying out an idea that uses the new ExtensionArray capability that is in master. What I would like to do is override the relational operators __le__, __ge__, etc. I'm following the cyberpandas example that @TomAugspurger published.

Using cyberpandas, I ran the following code

Code Sample, a copy-pastable example if possible

import cyberpandas as ip
import pandas as pd

v = ip.IPArray.from_pyints([1, 2, 3])
s = pd.Series(v)
try:
    res = (s <= pd.Series([4,1,3]))
except Exception as e:
    print("exception raised for comparison to Series ", e)
try:
    res = (s <= [4,1,3])
except Exception as e:
    print("exception raised for comparison to list ", e)

try:
    res = (s <= 2)
except Exception as e:
    print("exception raised for comparison to int ", e)
    
try:
    res = (s <= s[2])
    print("no exception raised for comparison to IPType")
except Exception as e:
    print("exception raised for comparison to IPType ", e)

The result is the following output:

exception raised for comparison to Series  invalid type comparison
exception raised for comparison to list  '<=' not supported between instances of 'IPv4Address' and 'int'
exception raised for comparison to int  '<=' not supported between instances of 'IPv4Address' and 'int'
no exception raised for comparison to IPType

Problem description

The first exception is expected, because cyberpandas.ip_array.__le__ raises NotImplemented, which is then caught in pandas.core.ops.na_op() at line 1166.

The second two exceptions are a bit not expected, because I would want the IPArray.__le__() method to be called so that the NotImplemented exception is raised, but in this case, there is code in pandas.core.ops.wrapper() around line 1248 that converts the Series into a numpy array of objects, and then pandas.core.ops._comp_method_OBJECT_ARRAY() is called, which is trying to do an efficient comparison, which never calls the IPArray.__le__ method.

So the problem here is that for my application, I would like the ExtensionArray.__le__() implementation to be called when other is either a constant or a list. I think the right fix is to place a test in wrapper that says if the self.dtype is an object, then call na_op() directly.

But maybe that's not the best way to do it??

Expected Output

exception raised for comparison to Series invalid type comparison
exception raised for comparison to list invalid type comparison
exception raised for comparison to int invalid type comparison
no exception raised for comparison to IPType

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+683.g402ad45da
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions