API: Index.append behaviour with categoricals

Follow-up of https://github.com/pandas-dev/pandas/pull/14545. 

We had a long discussion on what the behaviour of `concat` should be when you have categorical data: https://github.com/pandas-dev/pandas/pull/13767. In the end, for 0.19.0, we changed the behaviour of raising an error when categories didn't match to returning object dtyped data (only data with identical categories and ordered attributed gives a categorical as result). The table below is a summary of the changes between 0.18.1 and 0.19.0:

**For categorical Series:**

| left         | right        | append/concat 0.18 | append/concat 0.19.0    |
|---------|-------------------------------------------|---------|------------------------------|
| category | category (identical categories) | category | category |
| category | category (different categories) | error | object |
| category | not category | category | object  |
| category | not category (different categories) | category with NaNs | object |

However, we didn't change behaviour of `append` for Indexes (the above append is for series):

**For `CategoricalIndex`:**

| left         | right        | append 0.18 | append 0.19.0    | append 0.19.1 |
|---------|-------------------------------------------|---------|------------------------------|----|
| category | category (identical categories) | category | category | category |
| category | category (different categories) | error | error  | error  |
| category | not category | category | category |  category | 
| category | not category (with other values) | error | error  | error  |
| not category | category (with other values) | object | error | object

The last line, i.e. the case where the calling Index is not a CategoricalIndex, changed by accident in 0.19.0, and it is this that I corrected for in PR #14545 for 0.19.1.

Questions:

* Do we want the same behaviour for `Index.append` as we now have for `Series.append` with categorical data? This means that the column in the table above becomes 'object' apart from the first row.
* Do we want to make an exception for the case where the values in the 'right' correspond with the categories? (so that `pd.CategoricalIndex(['a', 'b', 'c']).append(pd.Index(['a']))`keeps working)

Changing this to always return object dtype unless for categoricals with indentical categories is easy, but gives a few failures in our test suite. Namely, in some indexing tests (indexing a DataFrame with a CategoricalIndex) there are changes in behaviour because indexing with a non-existing value in the index was performed using `CategoricalIndex.append()`. But this we can workaround in the indexing code of course.

cc @janschulz @sinhrks 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Index.append behaviour with categoricals #14586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

left	right	append/concat 0.18	append/concat 0.19.0
category	category (identical categories)	category	category
category	category (different categories)	error	object
category	not category	category	object
category	not category (different categories)	category with NaNs	object

API: Index.append behaviour with categoricals #14586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions