Description
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
...: a
In [3]: da1= make_data()
...: da2= make_data()
...:
In [4]: s1 = pd.Series(DecimalArray(da1))
...: s2 = pd.Series(DecimalArray(da2))
...:
In [5]: s1.head(), s2.head()
Out[5]:
(0 0.57581534881735985109685316274408251047134399...
1 0.05647135567908745379384072293760254979133605...
2 0.41049738961593973396446699553052894771099090...
3 0.13724377491342376611527242857846431434154510...
4 0.24154934068629707599740186196868307888507843...
dtype: decimal, 0 0.40855027024154888515283801098121330142021179...
1 0.21243084028671055385473209753399714827537536...
2 0.15218065149055393092680787958670407533645629...
3 0.87747422249812989658579454044229350984096527...
4 0.53991488184898328572813852588296867907047271...
dtype: decimal)
In [6]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-14abc20f0095> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\series.py in combine(self, other, func, fill_value)
2220 new_index = self.index.union(other.index)
2221 new_name = ops.get_op_result_name(self, other)
-> 2222 new_values = np.empty(len(new_index), dtype=self.dtype)
2223 for i, idx in enumerate(new_index):
2224 lv = self.get(idx, fill_value)
TypeError: data type not understood
Problem description
The Series.combine()
method uses numpy.empty
with the dtype
of the ExtensionArray
, and numpy
isn't happy with that.
Note: This also happens with Categorical
in v0.22 and in master:
In [3]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
...: categories=["one","two","three"], ordered=True)
...: cat2 = pd.Categorical(values=["three","two","one","one","two","three"],
...: categories=["one","two","three"], ordered=True)
...: s1 = pd.Series(cat1)
...: s2 = pd.Series(cat2)
...: s1, s2
...:
Out[3]:
(0 one
1 two
2 three
3 three
4 two
5 one
dtype: category
Categories (3, object): [one < two < three], 0 three
1 two
2 one
3 one
4 two
5 three
dtype: category
Categories (3, object): [one < two < three])
In [4]: s1.combine(s2, lambda x1, x2: x1 <= x2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-b597231c2d3c> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 <= x2)
C:\Anaconda3\lib\site-packages\pandas\core\series.py in combine(self, other, func, fill_value)
1768 new_index = self.index.union(other.index)
1769 new_name = _maybe_match_name(self, other)
-> 1770 new_values = np.empty(len(new_index), dtype=self.dtype)
1771 for i, idx in enumerate(new_index):
1772 lv = self.get(idx, fill_value)
TypeError: data type not understood
NOTE: I will look into fixing this as part of my attempt to get ops() working for ExtensionArray
Expected Output
A Series
of True
and False
values.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: 60fe82c
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0.dev0+799.g60fe82c8a
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None