Description
This is probably for @TomAugspurger to look at.
I am trying out an idea that uses the new ExtensionArray
capability that is in master. What I would like to do is override the relational operators __le__
, __ge__
, etc. I'm following the cyberpandas example that @TomAugspurger published.
Using cyberpandas, I ran the following code
Code Sample, a copy-pastable example if possible
import cyberpandas as ip
import pandas as pd
v = ip.IPArray.from_pyints([1, 2, 3])
s = pd.Series(v)
try:
res = (s <= pd.Series([4,1,3]))
except Exception as e:
print("exception raised for comparison to Series ", e)
try:
res = (s <= [4,1,3])
except Exception as e:
print("exception raised for comparison to list ", e)
try:
res = (s <= 2)
except Exception as e:
print("exception raised for comparison to int ", e)
try:
res = (s <= s[2])
print("no exception raised for comparison to IPType")
except Exception as e:
print("exception raised for comparison to IPType ", e)
The result is the following output:
exception raised for comparison to Series invalid type comparison
exception raised for comparison to list '<=' not supported between instances of 'IPv4Address' and 'int'
exception raised for comparison to int '<=' not supported between instances of 'IPv4Address' and 'int'
no exception raised for comparison to IPType
Problem description
The first exception is expected, because cyberpandas.ip_array.__le__
raises NotImplemented
, which is then caught in pandas.core.ops.na_op()
at line 1166.
The second two exceptions are a bit not expected, because I would want the IPArray.__le__()
method to be called so that the NotImplemented
exception is raised, but in this case, there is code in pandas.core.ops.wrapper()
around line 1248 that converts the Series
into a numpy
array of objects, and then pandas.core.ops._comp_method_OBJECT_ARRAY()
is called, which is trying to do an efficient comparison, which never calls the IPArray.__le__
method.
So the problem here is that for my application, I would like the ExtensionArray.__le__()
implementation to be called when other
is either a constant or a list. I think the right fix is to place a test in wrapper
that says if the self.dtype
is an object, then call na_op()
directly.
But maybe that's not the best way to do it??
Expected Output
exception raised for comparison to Series invalid type comparison
exception raised for comparison to list invalid type comparison
exception raised for comparison to int invalid type comparison
no exception raised for comparison to IPType
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0.dev0+683.g402ad45da
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None