Skip to content

pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError #25927

Closed
@allComputableThings

Description

@allComputableThings

Code Sample, a copy-pastable example if possible

For:


result = a2b.loc[vals]       # pd.Series()[ np.array ]

If a2b is a series that maps {int64:int64} and vals is an int64 array, the result should be a series that maps {int64:int64}, or a KeyError should be thrown

Pasteable repo:

       import pandas as pd
       import numpy as np
       a2b = pd.Series(
           index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                     9725301000001109,  9725601000001103,  9725801000001104,
                     9730701000001104, 10049011000001109, 10328511000001105]),
           data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104,
                        999000011000001104, 999000011000001104, 999000011000001104])
       )
       assert a2b.dtype==np.int64
       assert a2b.index.dtype==np.int64
       key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
                             9725301000001109,  9725601000001103,  9725801000001104,
                             9730701000001104,
                             10047311000001102, # Misin in a2b.index
                             10049011000001109,
                             10328511000001105])
       result = a2b.loc[key]
       result
       assert result.dtype==np.int64
       assert result.index.dtype==np.int64

What happens:

In [2]:         import pandas as pd
   ...:         import numpy as np
   ...:         a2b = pd.Series(
   ...:             index = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                       9725301000001109,  9725601000001103,  9725801000001104,
   ...:                       9730701000001104, 10049011000001109, 10328511000001105]),
   ...:             data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104,
   ...:                          999000011000001104, 999000011000001104, 999000011000001104])
   ...:         )
   ...:         assert a2b.dtype==np.int64
   ...:         assert a2b.index.dtype==np.int64
   ...:         key = np.array([ 9724501000001103,  9724701000001109,  9725101000001107,
   ...:                               9725301000001109,  9725601000001103,  9725801000001104,
   ...:                               9730701000001104,
   ...:                               10047311000001102, # Misin in a2b.index
   ...:                               10049011000001109,
   ...:                               10328511000001105])
   ...:         result = a2b.loc[key]
   ...:         result
   ...: 
Out[2]: 
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
NaN             NaN
 9.990000e+17   NaN
 9.990000e+17   NaN
dtype: float64

In [3]:         assert result.dtype==np.int64
   ...:         assert result.index.dtype==np.int64
   ...: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-be86ec17a393> in <module>()
----> 1 assert result.dtype==np.int64
      2 assert result.index.dtype==np.int64

AssertionError: 

Problem description

I don't like this behavior because:

  • I have quietly lost all my data due to cast to float64
  • in other calls to getitem a KeyError is raised if a value is not found in the index.

Expected Output

Asserts should not fail.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
In [4]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.15.candidate.1
python-bits: 64
OS: Linux
OS-release: 4.15.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.1
numpy: 1.16.1
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: 0.5.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions