Closed
Description
It appears that pandas-gbq
does not appropriate handle TIMESTAMP
values that are out of the range of what pd.Timestamp
can support. As a result, any query that contains a TIMESTAMP
column with a value outside of the supported range will raise an AttributeError
.
Environment details
- OS type and version:
Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:20 PDT 2021; root:xnu-7195.141.6~3/RELEASE_ARM64_T8101
- Python version:
Python 3.7.11
- pip version:
pip 21.2.4 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
pandas-gbq
version:
Name: pandas-gbq
Version: 0.16.0
Summary: Google BigQuery connector for pandas
Home-page: https://github.com/googleapis/python-bigquery-pandas
Author: pandas-gbq authors
Author-email: [email protected]
License: BSD-3-Clause
Location: /usr/local/lib/python3.7/site-packages
Requires: pandas, google-auth, setuptools, google-auth-oauthlib, pyarrow, numpy, google-cloud-bigquery, pydata-google-auth
Required-by:
Steps to reproduce
- Construct a query that returns a
TIMESTAMP
with a date earlier thanpd.Timestamp.min
. - Call
pandas_gbq.read_gbq()
with that SQL query.
Code example
import pandas_gbq
sql = 'SELECT TIMESTAMP("1677-09-21");'
pandas_gbq.read_gbq(sql) # raises AttributeError
Stack trace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 837, in read_gbq
dtypes=dtypes,
File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 471, in run_query
user_dtypes=dtypes,
File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 527, in _download_results
df = pandas_gbq.timestamp.localize_df(df, schema_fields)
File "/usr/local/lib/python3.7/site-packages/pandas_gbq/timestamp.py", line 41, in localize_df
if field["type"].upper() == "TIMESTAMP" and df[column].dt.tz is None:
File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5478, in __getattr__
return object.__getattribute__(self, name)
File "/usr/local/lib/python3.7/site-packages/pandas/core/accessor.py", line 181, in __get__
accessor_obj = self._accessor(obj)
File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 506, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
Expected result
pd.read_sql()
handles out-of-bounds timestamps gracefully by reverting to the object
dtype:
>>> import pandas as pd
>>> import sqlalchemy as sa
>>>
>>> username = ...
>>> password = ...
>>> host = ...
>>> port = ...
>>> database = ...
>>>
>>> conn_string = sa.engine.url.URL("postgresql", username=username, password=password, host=host, port=port, database=database)
>>> engine = sa.create_engine(conn_string)
>>> conn = engine.connect()
>>>
>>> sql = "SELECT timestamp '1677-09-22';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp datetime64[ns]
dtype: object
>>>
>>> sql = "SELECT timestamp '1677-09-21';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp object
dtype: object
I would expect pandas-gbq
to have the same behaviour.