Skip to content

combine_first returns unexpected results for timestamp dataframes #28481

Closed
@leo4183

Description

@leo4183

Problem Description

combine_first returns weird results in case there exist unmatched columns between two timestamp dataframes. this issue probably relates to #24357 (not retaining dtypes)

import pandas as pd
x = pd.DataFrame([pd.NaT,pd.NaT],columns=['a']).T
y = pd.DataFrame([[pd.NaT,pd.NaT],pd.to_datetime(['20190101','20190102'])],index=['a','b'],columns=[1,2])
x 0 1
a NaT NaT
y 1 2
a NaT NaT
b 2019-01-01 2019-01-02
Output of x.combine_first(y)
current 0 1 2
a NaT NaT -9.223372e+18
b NaT 2019-01-01 1.546387e+18
Output of x.reindex(columns=x.columns|y.columns).combine_first(y)
expected 0 1 2
a NaT NaT NaT
b NaT 2019-01-01 2019-01-02

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-80.7.2.el8_0.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions