Skip to content

API: type conversions on merges #15332

Open
@chris-b1

Description

@chris-b1

Currently any type conversions on merge are silent, e.g.

In [24]: a = pd.DataFrame({'cat_key': pd.Categorical(['a', 'b', 'c']), 'int_key': [1, 2, 3]})

In [25]: b = pd.DataFrame({'cat_key': pd.Categorical(['b', 'a', 'c']), 'values': [1, 2, 3]})

In [26]: a.merge(b).dtypes
Out[26]: 
cat_key    object
int_key     int64
values      int64
dtype: object

In [29]: b2 = pd.DataFrame({'int_key': [2.0, 1.0, 3.0], 'values': [1, 2, 3]})

In [30]: a.merge(b2)
Out[30]: 
  cat_key  int_key  values
0       a        1       2
1       b        2       1
2       c        3       3

In [31]: a.merge(b2).dtypes
Out[31]: 
cat_key    object
int_key     int64
values      int64
dtype: object

#15321 will make [26] preserve a categorical dtype, but if the categories don't overlap, it will be converted to object.

So, should there be a something like a conversions='ignore'|'warn'|'error' option?

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsError ReportingIncorrect or improved errors from pandasNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions