Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df1 = pd.DataFrame({
'a': [pd.Timestamp('2024-05-13 12:00:00', tz='America/New_York')],
})
df2 = pd.DataFrame({
'a': [pd.NaT],
})
df_concat = pd.concat([df1, df2])
df_concat['a'].max()
Issue Description
The above code raises the exception:
TypeError: '>=' not supported between instances of 'Timestamp' and 'float'
This code is a simplified version of some prod code that caused issues at my work.
I believe what's happening is df1 has dtype datetime[ns, America/New_York]
for the column and df2 has dtype datetime[ns]
, and when you concat them, the resulting dtype is object
. Then, .max()
coerces pd.NaT
to NaN, and you get a comparison between a Timestamp and a float.
Expected Behavior
I expect the code to return pd.Timestamp('2024-05-13 12:00:00', tz='America/New_York')
.
I think Pandas should be robust and handle this case, even though the dtypes aren't perfectly "correct". I think the right place to fix this is in the .max()
function: the code
max([pd.Timestamp('2024-05-13 12:00:00', tz='America/New_York'), pd.NaT])
(where max
is builtins.max
) works fine, so you would also expect the Pandas equivalent to work.
You could maybe also make an argument that pd.concat
should special-case this and return a column with dtype datetime64[ns, America/New_York]
, but I'm less sure about that.
Longer-term, I feel like Pandas should support datetime columns with heterogenous timezones; the requirement that timezones be the same for a whole column feels like an artificial constraint and many real-world datasets will naturally have heterogenous timezones.
Installed Versions
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.1.1
pip : 24.0
Cython : None
pytest : 8.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None