Closed
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
data = pd.DataFrame(dict(a=[1, 2], b=[True, False]))
data2 = pd.DataFrame(dict(b=[False, True], c=[4, 3])).set_index('b')
print('data dtypes:')
print(data.dtypes)
print('\ndata2 dtypes')
print(data2.dtypes)
merged = data.merge(data2, left_on='b', right_index=True)
print('\nmerged dtypes')
print(merged.dtypes)
Output:
a int64
b bool
dtype: object
data2 dtypes
c int64
dtype: object
merged dtypes
a int64
b object
c int64
dtype: object
Problem description
I have a dataframe with which has a column of type bool
, and another dataframe where the index is of type bool
. When I merge the first dataframe with the second, the column is promoted to an object, rather than maintaining its boolean type. This causes issues because the ~ operator then no longer inverts true and false; I have to manually convert the column back to a boolean. I'm guessing this is because there's not a BooleanIndex type in Pandas, so the index type is object. But it's quite confusing when columns unexpectedly change types during a merge.
The boolean index is part of a MultiIndex in my actual project, but the issue occurs even with a single index as shown in the MWE above.
Expected Output
column b
should remain a boolean.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.11.0
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None