Skip to content

BUG: Index.drop raising Error when Index has duplicates #38070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 2, 2020
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,7 @@ MultiIndex
- Bug in :meth:`DataFrame.reset_index` with ``NaT`` values in index raises ``ValueError`` with message ``"cannot convert float NaN to integer"`` (:issue:`36541`)
- Bug in :meth:`DataFrame.combine_first` when used with :class:`MultiIndex` containing string and ``NaN`` values raises ``TypeError`` (:issue:`36562`)
- Bug in :meth:`MultiIndex.drop` dropped ``NaN`` values when non existing key was given as input (:issue:`18853`)
- Bug in :meth:`MultiIndex.drop` dropping more values than expected when index has duplicates and is not sorted (:issue:`33494`)

I/O
^^^
Expand Down Expand Up @@ -764,6 +765,7 @@ Other
- Bug in :meth:`Index.union` behaving differently depending on whether operand is an :class:`Index` or other list-like (:issue:`36384`)
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError`` rather than a bare ``Exception`` (:issue:`35744`)
- Bug in ``dir`` where ``dir(obj)`` wouldn't show attributes defined on the instance for pandas objects (:issue:`37173`)
- Bug in :meth:`Index.drop` raising ``InvalidIndexError`` when index has duplicates (:issue:`38051`)

.. ---------------------------------------------------------------------------

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5508,7 +5508,7 @@ def drop(self, labels, errors: str_t = "raise"):
"""
arr_dtype = "object" if self.dtype == "object" else None
labels = com.index_labels_to_array(labels, dtype=arr_dtype)
indexer = self.get_indexer(labels)
indexer = self.get_indexer_for(labels)
mask = indexer == -1
if mask.any():
if errors != "ignore":
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2169,7 +2169,8 @@ def drop(self, codes, level=None, errors="raise"):
if isinstance(loc, int):
inds.append(loc)
elif isinstance(loc, slice):
inds.extend(range(loc.start, loc.stop))
step = loc.step if loc.step is not None else 1
inds.extend(range(loc.start, loc.stop, step))
elif com.is_bool_indexer(loc):
if self.lexsort_depth == 0:
warnings.warn(
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/indexes/multi/test_drop.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import warnings

import numpy as np
import pytest

Expand Down Expand Up @@ -147,3 +149,13 @@ def test_drop_with_nan_in_index(nulls_fixture):
msg = r"labels \[Timestamp\('2001-01-01 00:00:00'\)\] not found in level"
with pytest.raises(KeyError, match=msg):
mi.drop(pd.Timestamp("2001"), level="date")


def test_drop_with_non_monotonic_duplicates():
# GH#33494
mi = MultiIndex.from_tuples([(1, 2), (2, 3), (1, 2)])
with warnings.catch_warnings():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could maybe use @pytest.mark.filterwarnings as a follow-up

warnings.simplefilter("ignore", PerformanceWarning)
result = mi.drop((1, 2))
expected = MultiIndex.from_tuples([(2, 3)])
tm.assert_index_equal(result, expected)
9 changes: 9 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1494,6 +1494,15 @@ def test_drop_tuple(self, values, to_drop):
with pytest.raises(KeyError, match=msg):
removed.drop(drop_me)

def test_drop_with_duplicates_in_index(self, index):
# GH38051
if len(index) == 0:
return
expected = index.drop(index[0]).repeat(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works, but ideally we'd form expected without using drop. could do

index = index.unique()
index = index.repeat(2)
expected = index[2:]
result = index.drop(index[0])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a similar idea, but the fixture contains indexes with duplicates. If drop is bad, we could use unique and index[1:] atfterwards?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write your idea out explicitly? im not clear on how it is different from what i wrote

Copy link
Member Author

@phofl phofl Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forget what I have said. I missed your first line

index = index.repeat(2)
result = index.drop(index[0])
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize(
"attr",
[
Expand Down