Skip to content

BUG: Concatenating DataFrames with NaT TZ Timestamps results in incorrect dtype #23037

Closed
@tonytao2012

Description

@tonytao2012

Code Sample, a copy-pastable example if possible

ts1 = pd.Timestamp(pd.NaT, tz='UTC')
ts2 = pd.Timestamp('2015-01-01', tz='UTC')

df1 = pd.DataFrame([[ts1]])
df2 = pd.DataFrame([[ts2]])

result = pd.concat([df1, df2])

result[0]
Out[6]: 
0                          NaT
0    2015-01-01 00:00:00+00:00
Name: 0, dtype: object

expected = pd.DataFrame([[ts1], [ts2]])

expected[0]
Out[8]: 
0                         NaT
1   2015-01-01 00:00:00+00:00
Name: 0, dtype: datetime64[ns, UTC]

Problem description

Concatenating a DataFrame containing NaT with a tz specified with another DataFrame containing an actual date with tz specified results in an object dtype instead of the expected datetime64[ns, UTC] dtype.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.0.dev0+708.gce1f81f8b
pytest: 3.8.2
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: 0.10.9
IPython: 6.5.0
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: 2.5.5
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.1
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions