Skip to content

.loc[iterator] treats missing keys differently than .loc[list] #20748

Closed
@toobaz

Description

@toobaz

Code Sample, a copy-pastable example if possible

In [2]: pd.Series([1,2,3]).loc[[i for i in (4,5)]]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-aba9eb9eea33> in <module>()
----> 1 pd.Series([1,2,3]).loc[[i for i in (4,5)]]

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1370 
   1371             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1372             return self._getitem_axis(maybe_callable, axis=axis)
   1373 
   1374     def _is_scalar_access(self, key):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1829                     raise ValueError('Cannot index with multidimensional key')
   1830 
-> 1831                 return self._getitem_iterable(key, axis=axis)
   1832 
   1833             # nested tuple slicing

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1109 
   1110         if self._should_validate_iterable(axis):
-> 1111             self._has_valid_type(key, axis)
   1112 
   1113         labels = self.obj._get_axis(axis)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1683                         raise KeyError(
   1684                             u"None of [{key}] are in the [{axis}]".format(
-> 1685                                 key=key, axis=self.obj._get_axis_name(axis)))
   1686                     else:
   1687 

KeyError: 'None of [[4, 5]] are in the [index]'

In [3]: pd.Series([1,2,3]).loc[(i for i in (4,5))]
Out[3]: 
4   NaN
5   NaN
dtype: float64

Problem description

Since we convert iterators to lists anyway...

indexer, keyarr = labels._convert_listlike_indexer(

... we might as well do the conversion as soon as possible (i.e., in __getitem__), and simplify the code by only handling list-likes which have a length. I would also consider changing is_list_like to return False for iterators, or provide it with a has_len=False argument.

It would also solve this other, less important, difference:

In [2]: pd.Series([1,2,3]).loc[[i for i in (2,5)]]
/usr/bin/ipython3:1: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  #! /bin/sh
Out[2]: 
2    3.0
5    NaN
dtype: float64

In [3]: pd.Series([1,2,3]).loc[(i for i in (2,5))]
Out[3]: 
2    3.0
5    NaN
dtype: float64

... and probably others.

Expected Output

Exactly the same for lists and iterators.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d04b746
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.23.0.dev0+754.gd04b7464d.dirty
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.25.2
numpy: 1.14.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions