Description
Code Sample, a copy-pastable example if possible
In [2]: df1 = pd.DataFrame({'key': [1, 2], 'value': [0, 1]})
In [3]: df1.dtypes
Out[3]:
key int64
value int64
dtype: object
In [4]: df2 = pd.DataFrame({'key': [1.0, 2.0], 'other_value': ['A', 'B']})
In [5]: df2.dtypes
Out[5]:
key float64
other_value object
dtype: object
In [6]: print(pd.merge(df1, df2, how='left', on='key').dtypes)
key object
value int64
other_value object
dtype: object
Problem description
I was expecting that in the merged DataFrame's "key" column pandas would either upcast int
to float
(like it does e.g. when missing values occur in an int
column) or leave the column dtype as int
.
I checked and confirmed that the latter was the behaviour in pandas 0.19.2
Expected Output
Merging two DataFrames where the key column is of type int
in the one and of type float
in the other frame produces a key column of type int
or float
in the resulting frame (not object
).
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None