Skip to content

API: Set ops for CategoricalIndex #10186

Open
@sinhrks

Description

@sinhrks

Derived from #10157. Would like to clarify what these results should be. Basically, I think:

  • The result must be CategoricalIndex
  • The result should have the same values as the result of normal Index which has the same original values.
  • Result's category should only include categories which the result actually has.

Followings are current results.

intersection

# for reference
pd.Index([1, 2, 3, 1, 2, 3]).intersection(pd.Index([2, 3, 4, 2, 3, 4]))
# Int64Index([2, 2, 3, 3], dtype='int64')

pd.CategoricalIndex([1, 2, 3, 1, 2, 3]).intersection(pd.CategoricalIndex([2, 3, 4, 2, 3, 4]))
# CategoricalIndex([2, 2, 3, 3], categories=[1, 2, 3], ordered=False, dtype='category')
# -> Is this OK or it should have categories=[2, 3]?

union

Doc says "Form the union of two Index objects and sorts if possible". I'm not sure whether the last sentence says "raise error if sort is impossible" or "not sort if impossible"?

pd.Index([1, 2, 4]).union(pd.Index([2, 3, 4]))
# Int64Index([1, 2, 3, 4], dtype='int64')

pd.CategoricalIndex([1, 2, 4]).union(pd.CategoricalIndex([2, 3, 4]))
# CategoricalIndex([1, 2, 4, 3], categories=[1, 2, 3, 4], ordered=False, dtype='category')
# -> Should be sorted?
pd.Index([1, 2, 3, 1, 2, 3]).union(pd.Index([2, 3, 4, 2, 3, 4]))
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects
-> This should results Index([1, 2, 3, 1, 2, 3, 4, 4])?

pd.CategoricalIndex([1, 2, 3, 1, 2, 3]).union(pd.CategoricalIndex([2, 3, 4, 2, 3, 4]))
# TypeError: type() takes 1 or 3 arguments
# -> should raise understandable error, or Int64Index shouldn't raise (and return unsorted result?)

difference

pd.CategoricalIndex([1, 2, 4, 5]).difference(pd.CategoricalIndex([2, 3, 4]))
# Int64Index([1, 5], dtype='int64')
# -> should be CategoricalIndex?

sym_diff

pd.CategoricalIndex([1, 2, 4, 5]).sym_diff(pd.CategoricalIndex([2, 4]))
# Int64Index([1, 5], dtype='int64')
# -> should be CategoricalIndex?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data Typesetopsunion, intersection, difference, symmetric_difference

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions