Skip to content

BUG/API: Unordered Categorical should ignore order in comparisons? #16014

Closed
@TomAugspurger

Description

@TomAugspurger

I think that when comparing two unordered categorical-dtyped series with categories which differ only by ordered should compare equal

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
c
In [2]: c1 = pd.Series(pd.Categorical(['a', 'b']))

In [3]: c2 = pd.Series(pd.Categorical(['a', 'b'], categories=['b', 'a']))

In [4]: c1
Out[4]:
0    a
1    b
dtype: category
Categories (2, object): [a, b]

In [5]: c2
Out[5]:
0    a
1    b
dtype: category
Categories (2, object): [b, a]

In [6]: c1 == c2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-d8d43a43a02a> in <module>()
----> 1 c1 == c2

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
    811                 msg = 'Can only compare identically-labeled Series objects'
    812                 raise ValueError(msg)
--> 813             return self._constructor(na_op(self.values, other.values),
    814                                      index=self.index, name=name)
    815         elif isinstance(other, pd.DataFrame):  # pragma: no cover

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
    752         # in either operand
    753         if is_categorical_dtype(x):
--> 754             return op(x, y)
    755         elif is_categorical_dtype(y) and not isscalar(y):
    756             return op(y, x)

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/categorical.py in f(self, other)
     55             if ((len(self.categories) != len(other.categories)) or
     56                     not ((self.categories == other.categories).all())):
---> 57                 raise TypeError("Categoricals can only be compared if "
     58                                 "'categories' are the same")
     59             if not (self.ordered == other.ordered):

TypeError: Categoricals can only be compared if 'categories' are the same

Expected Output

I think this should return True. Unordered categories shouldn't care about the order :)

cc @JanSchulz thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions