Skip to content

Commit a07d5fc

Browse files
author
tp
committed
Add __contains__ to CategoricalIndex
1 parent 0c65c57 commit a07d5fc

File tree

3 files changed

+24
-7
lines changed

3 files changed

+24
-7
lines changed

asv_bench/benchmarks/categoricals.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,3 +193,16 @@ def time_categorical_series_is_monotonic_increasing(self):
193193

194194
def time_categorical_series_is_monotonic_decreasing(self):
195195
self.s.is_monotonic_decreasing
196+
197+
198+
class Contains(object):
199+
200+
goal_time = 0.2
201+
202+
def setup(self):
203+
N = 10**5
204+
self.ci = tm.makeCategoricalIndex(N)
205+
self.cat = self.ci.categories[0]
206+
207+
def time_contains(self):
208+
self.cat in self.ci

doc/source/whatsnew/v0.23.1.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ Performance Improvements
3333

3434
- Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`)
3535
- Improved performance of :meth:`CategoricalIndex.is_unique` (:issue:`21107`)
36+
- Improved performance of membership checks in :class:`CategoricalIndex`
37+
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains` is likewise much faster (:issue:`21107`)
3638
-
3739
-
3840

pandas/core/indexes/category.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -328,16 +328,18 @@ def __contains__(self, key):
328328
if self.categories._defer_to_indexing:
329329
return key in self.categories
330330

331-
return key in self.values
331+
try:
332+
code_value = self.categories.get_loc(key)
333+
except KeyError:
334+
if isna(key):
335+
code_value = -1
336+
else:
337+
return False
338+
return code_value in self._engine
332339

333340
@Appender(_index_shared_docs['contains'] % _index_doc_kwargs)
334341
def contains(self, key):
335-
hash(key)
336-
337-
if self.categories._defer_to_indexing:
338-
return self.categories.contains(key)
339-
340-
return key in self.values
342+
return key in self
341343

342344
def __array__(self, dtype=None):
343345
""" the array interface, return my values """

0 commit comments

Comments
 (0)