Description
Code Sample, a copy-pastable example if possible
For:
result = a2b.loc[vals] # pd.Series()[ np.array ]
If a2b is a series that maps {int64:int64}
and vals is an int64
array, the result should be a series that maps {int64:int64}
, or a KeyError should be thrown
Pasteable repo:
import pandas as pd
import numpy as np
a2b = pd.Series(
index = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
9725301000001109, 9725601000001103, 9725801000001104,
9730701000001104, 10049011000001109, 10328511000001105]),
data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
999000011000001104, 999000011000001104, 999000011000001104,
999000011000001104, 999000011000001104, 999000011000001104])
)
assert a2b.dtype==np.int64
assert a2b.index.dtype==np.int64
key = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
9725301000001109, 9725601000001103, 9725801000001104,
9730701000001104,
10047311000001102, # Misin in a2b.index
10049011000001109,
10328511000001105])
result = a2b.loc[key]
result
assert result.dtype==np.int64
assert result.index.dtype==np.int64
What happens:
In [2]: import pandas as pd
...: import numpy as np
...: a2b = pd.Series(
...: index = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
...: 9725301000001109, 9725601000001103, 9725801000001104,
...: 9730701000001104, 10049011000001109, 10328511000001105]),
...: data = np.array([999000011000001104, 999000011000001104, 999000011000001104,
...: 999000011000001104, 999000011000001104, 999000011000001104,
...: 999000011000001104, 999000011000001104, 999000011000001104])
...: )
...: assert a2b.dtype==np.int64
...: assert a2b.index.dtype==np.int64
...: key = np.array([ 9724501000001103, 9724701000001109, 9725101000001107,
...: 9725301000001109, 9725601000001103, 9725801000001104,
...: 9730701000001104,
...: 10047311000001102, # Misin in a2b.index
...: 10049011000001109,
...: 10328511000001105])
...: result = a2b.loc[key]
...: result
...:
Out[2]:
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
9.990000e+17 NaN
NaN NaN
9.990000e+17 NaN
9.990000e+17 NaN
dtype: float64
In [3]: assert result.dtype==np.int64
...: assert result.index.dtype==np.int64
...:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-3-be86ec17a393> in <module>()
----> 1 assert result.dtype==np.int64
2 assert result.index.dtype==np.int64
AssertionError:
Problem description
I don't like this behavior because:
- I have quietly lost all my data due to cast to float64
- in other calls to getitem a KeyError is raised if a value is not found in the index.
Expected Output
Asserts should not fail.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
In [4]: pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.15.candidate.1
python-bits: 64
OS: Linux
OS-release: 4.15.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.1
numpy: 1.16.1
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: 0.5.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None