Skip to content

read_gbq() raises AttributeError for TIMESTAMP values out of bounds for pd.Timestamp #468

Closed
@RevolutionTech

Description

@RevolutionTech

It appears that pandas-gbq does not appropriate handle TIMESTAMP values that are out of the range of what pd.Timestamp can support. As a result, any query that contains a TIMESTAMP column with a value outside of the supported range will raise an AttributeError.

Environment details

  • OS type and version: Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:20 PDT 2021; root:xnu-7195.141.6~3/RELEASE_ARM64_T8101
  • Python version: Python 3.7.11
  • pip version: pip 21.2.4 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
  • pandas-gbq version:
Name: pandas-gbq
Version: 0.16.0
Summary: Google BigQuery connector for pandas
Home-page: https://github.com/googleapis/python-bigquery-pandas
Author: pandas-gbq authors
Author-email: [email protected]
License: BSD-3-Clause
Location: /usr/local/lib/python3.7/site-packages
Requires: pandas, google-auth, setuptools, google-auth-oauthlib, pyarrow, numpy, google-cloud-bigquery, pydata-google-auth
Required-by: 

Steps to reproduce

  1. Construct a query that returns a TIMESTAMP with a date earlier than pd.Timestamp.min.
  2. Call pandas_gbq.read_gbq() with that SQL query.

Code example

import pandas_gbq

sql = 'SELECT TIMESTAMP("1677-09-21");'
pandas_gbq.read_gbq(sql)  # raises AttributeError

Stack trace

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 837, in read_gbq
    dtypes=dtypes,
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 471, in run_query
    user_dtypes=dtypes,
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 527, in _download_results
    df = pandas_gbq.timestamp.localize_df(df, schema_fields)
  File "/usr/local/lib/python3.7/site-packages/pandas_gbq/timestamp.py", line 41, in localize_df
    if field["type"].upper() == "TIMESTAMP" and df[column].dt.tz is None:
  File "/usr/local/lib/python3.7/site-packages/pandas/core/generic.py", line 5478, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/accessor.py", line 181, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 506, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values

Expected result

pd.read_sql() handles out-of-bounds timestamps gracefully by reverting to the object dtype:

>>> import pandas as pd
>>> import sqlalchemy as sa
>>>
>>> username = ...
>>> password = ...
>>> host = ...
>>> port = ...
>>> database = ...
>>>
>>> conn_string = sa.engine.url.URL("postgresql", username=username, password=password, host=host, port=port, database=database)
>>> engine = sa.create_engine(conn_string)
>>> conn = engine.connect()
>>>
>>> sql = "SELECT timestamp '1677-09-22';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp    datetime64[ns]
dtype: object
>>>
>>> sql = "SELECT timestamp '1677-09-21';"
>>> df = pd.read_sql(con=conn, sql=sql)
>>> df.dtypes
timestamp    object
dtype: object

I would expect pandas-gbq to have the same behaviour.

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-pandas API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions