Closed
Description
I think that when comparing two unordered categorical-dtyped series with categories which differ only by ordered should compare equal
Code Sample, a copy-pastable example if possible
In [1]: import pandas as pd
c
In [2]: c1 = pd.Series(pd.Categorical(['a', 'b']))
In [3]: c2 = pd.Series(pd.Categorical(['a', 'b'], categories=['b', 'a']))
In [4]: c1
Out[4]:
0 a
1 b
dtype: category
Categories (2, object): [a, b]
In [5]: c2
Out[5]:
0 a
1 b
dtype: category
Categories (2, object): [b, a]
In [6]: c1 == c2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-d8d43a43a02a> in <module>()
----> 1 c1 == c2
/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
811 msg = 'Can only compare identically-labeled Series objects'
812 raise ValueError(msg)
--> 813 return self._constructor(na_op(self.values, other.values),
814 index=self.index, name=name)
815 elif isinstance(other, pd.DataFrame): # pragma: no cover
/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y)
752 # in either operand
753 if is_categorical_dtype(x):
--> 754 return op(x, y)
755 elif is_categorical_dtype(y) and not isscalar(y):
756 return op(y, x)
/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/categorical.py in f(self, other)
55 if ((len(self.categories) != len(other.categories)) or
56 not ((self.categories == other.categories).all())):
---> 57 raise TypeError("Categoricals can only be compared if "
58 "'categories' are the same")
59 if not (self.ordered == other.ordered):
TypeError: Categoricals can only be compared if 'categories' are the same
Expected Output
I think this should return True. Unordered categories shouldn't care about the order :)
cc @JanSchulz thoughts?