Skip to content

BUG: Series.combine() fails with ExtensionArray inside of Series #20825

Closed
@Dr-Irv

Description

@Dr-Irv

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: from pandas.tests.extension.decimal.array import DecimalArray, make_dat
   ...: a

In [3]: da1= make_data()
   ...: da2= make_data()
   ...:

In [4]: s1 = pd.Series(DecimalArray(da1))
   ...: s2 = pd.Series(DecimalArray(da2))
   ...:

In [5]: s1.head(), s2.head()
Out[5]:
(0    0.57581534881735985109685316274408251047134399...
 1    0.05647135567908745379384072293760254979133605...
 2    0.41049738961593973396446699553052894771099090...
 3    0.13724377491342376611527242857846431434154510...
 4    0.24154934068629707599740186196868307888507843...
 dtype: decimal, 0    0.40855027024154888515283801098121330142021179...
 1    0.21243084028671055385473209753399714827537536...
 2    0.15218065149055393092680787958670407533645629...
 3    0.87747422249812989658579454044229350984096527...
 4    0.53991488184898328572813852588296867907047271...
 dtype: decimal)

In [6]: s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-14abc20f0095> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 if x1 < x2 else x2)

C:\EclipseWorkspaces\LiClipseWorkspace\pandas-dev\pandas36\pandas\core\series.py in combine(self, other, func, fill_value)
   2220             new_index = self.index.union(other.index)
   2221             new_name = ops.get_op_result_name(self, other)
-> 2222             new_values = np.empty(len(new_index), dtype=self.dtype)
   2223             for i, idx in enumerate(new_index):
   2224                 lv = self.get(idx, fill_value)

TypeError: data type not understood

Problem description

The Series.combine() method uses numpy.empty with the dtype of the ExtensionArray, and numpy isn't happy with that.

Note: This also happens with Categorical in v0.22 and in master:

In [3]: cat1 = pd.Categorical(values=["one","two","three","three","two","one"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: cat2 = pd.Categorical(values=["three","two","one","one","two","three"],
   ...:  categories=["one","two","three"], ordered=True)
   ...: s1 = pd.Series(cat1)
   ...: s2 = pd.Series(cat2)
   ...: s1, s2
   ...:
Out[3]:
(0      one
 1      two
 2    three
 3    three
 4      two
 5      one
 dtype: category
 Categories (3, object): [one < two < three], 0    three
 1      two
 2      one
 3      one
 4      two
 5    three
 dtype: category
 Categories (3, object): [one < two < three])

In [4]: s1.combine(s2, lambda x1, x2: x1 <= x2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b597231c2d3c> in <module>()
----> 1 s1.combine(s2, lambda x1, x2: x1 <= x2)

C:\Anaconda3\lib\site-packages\pandas\core\series.py in combine(self, other, func, fill_value)
   1768             new_index = self.index.union(other.index)
   1769             new_name = _maybe_match_name(self, other)
-> 1770             new_values = np.empty(len(new_index), dtype=self.dtype)
   1771             for i, idx in enumerate(new_index):
   1772                 lv = self.get(idx, fill_value)

TypeError: data type not understood

NOTE: I will look into fixing this as part of my attempt to get ops() working for ExtensionArray

Expected Output

A Series of True and False values.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: 60fe82c
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+799.g60fe82c8a
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions