Skip to content

BUG: merge with categoricals does not preserve categories dtype #10409

Closed
@amelio-vazquez-reina

Description

@amelio-vazquez-reina

xref #14351

None of the following merge operations retain the category types. Is this expected? How can I keep them?

Merging on a category type:

Consider the following:

A = pd.DataFrame({'X': np.random.choice(['foo', 'bar'],size=(10,)), 
                  'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})
A['X'] = A['X'].astype('category')

B = pd.DataFrame({'X': np.random.choice(['foo', 'bar'],size=(10,)), 
                  'Z': np.random.choice(['jjj', 'kkk', 'sss'], size=(10,))})
B['X'] = B['X'].astype('category')

if I do the merge, we end up with:

> pd.merge(A, B, on='X').dtypes 
X    object
Y    object
Z    object
dtype: object

Merging on a non-category type:

A = pd.DataFrame({'X': np.random.choice(['foo', 'bar'],size=(10,)), 
                  'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})
A['Y'] = A['Y'].astype('category')

B = pd.DataFrame({'X': np.random.choice(['foo', 'bar'],size=(10,)), 
                  'Z': np.random.choice(['jjj', 'kkk', 'sss'], size=(10,))})
B['Z'] = B['Z'].astype('category')

if I do the merge, we end up with:

pd.merge(A, B, on='X').dtypes
X    object
Y    object
Z    object
dtype: object

Metadata

Metadata

Assignees

Labels

API DesignCategoricalCategorical Data TypeReshapingConcat, Merge/Join, Stack/Unstack, Explode

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions