Index.difference performance

I need to append several big Series to a big categorical Series.
Trying to update categories FAST i've found out that `Index.difference` uses Python's `set`, which is slow on creating LARGE set (i have up to 500k categories and 1.3M values).
numpy's `setdiff1` is more than an order of magnitude faster (as of datetime64 Categorical):

```
tmp_unique = tmp.unique()
new_cats = pd.Index(pd.np.setdiff1d(tmp_unique[~pd.isnull(tmp_unique)], to.cat.categories))
```

Not so fast:

```
new_cats = pd.Index(tmp_unique[~pd.isnull(tmp_unique)]).difference(to.cat.categories)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index.difference performance #12044

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Index.difference performance #12044

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions