Skip to content

API: Index.append behaviour with categoricals #14586

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Follow-up of #14545.

We had a long discussion on what the behaviour of concat should be when you have categorical data: #13767. In the end, for 0.19.0, we changed the behaviour of raising an error when categories didn't match to returning object dtyped data (only data with identical categories and ordered attributed gives a categorical as result). The table below is a summary of the changes between 0.18.1 and 0.19.0:

For categorical Series:

left right append/concat 0.18 append/concat 0.19.0
category category (identical categories) category category
category category (different categories) error object
category not category category object
category not category (different categories) category with NaNs object

However, we didn't change behaviour of append for Indexes (the above append is for series):

For CategoricalIndex:

left right append 0.18 append 0.19.0 append 0.19.1
category category (identical categories) category category category
category category (different categories) error error error
category not category category category category
category not category (with other values) error error error
not category category (with other values) object error object

The last line, i.e. the case where the calling Index is not a CategoricalIndex, changed by accident in 0.19.0, and it is this that I corrected for in PR #14545 for 0.19.1.

Questions:

  • Do we want the same behaviour for Index.append as we now have for Series.append with categorical data? This means that the column in the table above becomes 'object' apart from the first row.
  • Do we want to make an exception for the case where the values in the 'right' correspond with the categories? (so that pd.CategoricalIndex(['a', 'b', 'c']).append(pd.Index(['a']))keeps working)

Changing this to always return object dtype unless for categoricals with indentical categories is easy, but gives a few failures in our test suite. Namely, in some indexing tests (indexing a DataFrame with a CategoricalIndex) there are changes in behaviour because indexing with a non-existing value in the index was performed using CategoricalIndex.append(). But this we can workaround in the indexing code of course.

cc @JanSchulz @sinhrks

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCategoricalCategorical Data TypeEnhancementIndexRelated to the Index class or subclassesReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions