Skip to content

ENH: specificy missing labels in loc calls GH34272 #34912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ including other versions of pandas.
Enhancements
~~~~~~~~~~~~

.. _whatsnew_110.specify_missing_labels:

KeyErrors raised by loc specify missing labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, if labels were missing for a loc call, a KeyError was raised stating that this was no longer supported.

Now the error message also includes a list of the missing labels (max 10 items, display width 80 characters). See :issue:`34272`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add the issue number here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do it like other issue references, e.g. :issue:`34272`

Copy link
Contributor Author

@timhunderwood timhunderwood Jun 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback - sure, this is now done.


.. _whatsnew_110.astype_string:

All dtypes can now be converted to ``StringDtype``
Expand Down
19 changes: 13 additions & 6 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import numpy as np

from pandas._config.config import option_context

from pandas._libs.indexing import _NDFrameIndexerBase
from pandas._libs.lib import item_from_zerodim
from pandas.errors import AbstractMethodError, InvalidIndexError
Expand Down Expand Up @@ -1283,7 +1285,8 @@ def _validate_read_indexer(
return

# Count missing values:
missing = (indexer < 0).sum()
missing_mask = indexer < 0
missing = (missing_mask).sum()

if missing:
if missing == len(indexer):
Expand All @@ -1302,11 +1305,15 @@ def _validate_read_indexer(
# code, so we want to avoid warning & then
# just raising
if not ax.is_categorical():
raise KeyError(
"Passing list-likes to .loc or [] with any missing labels "
"is no longer supported, see "
"https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike" # noqa:E501
)
not_found = key[missing_mask]

with option_context("display.max_seq_items", 10, "display.width", 80):
raise KeyError(
"Passing list-likes to .loc or [] with any missing labels "
"is no longer supported. "
f"The following labels were missing: {not_found}. "
"See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike" # noqa:E501
)


@doc(IndexingMixin.iloc)
Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1075,3 +1075,32 @@ def test_setitem_with_bool_mask_and_values_matching_n_trues_in_length():
result = ser
expected = pd.Series([None] * 3 + list(range(5)) + [None] * 2).astype("object")
tm.assert_series_equal(result, expected)


def test_missing_labels_inside_loc_matched_in_error_message():
# GH34272
s = pd.Series({"a": 1, "b": 2, "c": 3})
error_message_regex = "missing_0.*missing_1.*missing_2"
with pytest.raises(KeyError, match=error_message_regex):
s.loc[["a", "b", "missing_0", "c", "missing_1", "missing_2"]]


def test_many_missing_labels_inside_loc_error_message_limited():
# GH34272
n = 10000
missing_labels = [f"missing_{label}" for label in range(n)]
s = pd.Series({"a": 1, "b": 2, "c": 3})
# regex checks labels between 4 and 9995 are replaced with ellipses
error_message_regex = "missing_4.*\\.\\.\\..*missing_9995"
with pytest.raises(KeyError, match=error_message_regex):
s.loc[["a", "c"] + missing_labels]


def test_long_text_missing_labels_inside_loc_error_message_limited():
# GH34272
s = pd.Series({"a": 1, "b": 2, "c": 3})
missing_labels = [f"long_missing_label_text_{i}" * 5 for i in range(3)]
# regex checks for very long labels there are new lines between each
error_message_regex = "long_missing_label_text_0.*\\\\n.*long_missing_label_text_1"
with pytest.raises(KeyError, match=error_message_regex):
s.loc[["a", "c"] + missing_labels]