Skip to content

When merging on boolean index in right dataframe, merge key becomes an object #23884

Closed
@mattwigway

Description

@mattwigway

Code Sample, a copy-pastable example if possible

import pandas as pd

data = pd.DataFrame(dict(a=[1, 2], b=[True, False]))
data2 = pd.DataFrame(dict(b=[False, True], c=[4, 3])).set_index('b')

print('data dtypes:')
print(data.dtypes)

print('\ndata2 dtypes')
print(data2.dtypes)

merged = data.merge(data2, left_on='b', right_index=True)

print('\nmerged dtypes')
print(merged.dtypes)

Output:

a    int64
b     bool
dtype: object

data2 dtypes
c    int64
dtype: object

merged dtypes
a     int64
b    object
c     int64
dtype: object

Problem description

I have a dataframe with which has a column of type bool, and another dataframe where the index is of type bool. When I merge the first dataframe with the second, the column is promoted to an object, rather than maintaining its boolean type. This causes issues because the ~ operator then no longer inverts true and false; I have to manually convert the column back to a boolean. I'm guessing this is because there's not a BooleanIndex type in Pandas, so the index type is object. But it's quite confusing when columns unexpectedly change types during a merge.

The boolean index is part of a MultiIndex in my actual project, but the issue occurs even with a single index as shown in the MWE above.

Expected Output

column b should remain a boolean.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.11.0
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions