Skip to content

BUG: Series.get() on ExtensionArray series (and Categorical) indexed by integer returns incorrect result #20882

Closed
@Dr-Irv

Description

@Dr-Irv

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: import decimal

In [3]: from pandas.tests.extension.decimal.array import DecimalArray
   ...:
   ...: a = DecimalArray([decimal.Decimal(str(i)) for i in range(5)])
   ...: sa = pd.Series(a, index=[2*i for i in range(5)])
   ...:

In [4]: sa
Out[4]:
0    0
2    1
4    2
6    3
8    4
dtype: decimal

In [5]: sa.get(4)
Out[5]: Decimal('4')

In [6]: sb = pd.Series([i for i in range(5)], index=sa.index)

In [7]: sb
Out[7]:
0    0
2    1
4    2
6    3
8    4
dtype: int64

In [8]: sb.get(4)
Out[8]: 2

In [14]: cat = pd.Categorical(values=["a", "b", "c", "a", "b", "c"],
    ...: categories=["a", "b", "c"], ordered=True)

In [15]: s = pd.Series(cat, index=[2*i for i in range(6)])

In [16]: s
Out[16]:
0     a
2     b
4     c
6     a
8     b
10    c
dtype: category
Categories (3, object): [a < b < c]

In [18]: s.get(2)
Out[18]: 'c'

In [22]: s2 = pd.Series(list(s.values), index=s.index)

In [23]: s2
Out[23]:
0     a
2     b
4     c
6     a
8     b
10    c
dtype: object

In [24]: s2.get(2)
Out[24]: 'b'

Problem description

In the above, sb is a standard Series and sb.get(4) returns the element with index value 4. But for sa, which is backed by an ExtensionArray, it is returning the 4th element of the array.

For the series s containing Categorical, s.get(2) is returning the 3rd element of the array, rather than the second.

Expected Output

sa.get(4) should be Decimal('2')

For the Categorical example, s.get(2) should return 'b', similar to the expression s2.get(2).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0.dev0+811.g4afc75638
pytest: 3.4.0
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.25.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExtensionArrayExtending pandas with custom dtypes or arrays.IndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions