Open
Description
There is a substantial difference in execution time between pandas_gbq.read_gbq
and google.cloud.bigquery.job.query.QueryJob.to_dataframe
functions.
The reason is pandas_gbq.read_gbq
calls rows_iter.to_dataframe
here with dtypes=conversion_dtypes
which causes converting columns in RowIterator.to_dataframe.
conversion_dtypes
is created this way:
conversion_dtypes = _bqschema_to_nullsafe_dtypes(schema_fields)
Why this is default behavior?
In our case we don't need these costly column conversions.
Profiling results
117.538 RowIterator.to_dataframe google/cloud/bigquery/table.py:2261
65.551 DataFrame.__setitem__ pandas/core/frame.py:3630
48.537 RowIterator.to_arrow google/cloud/bigquery/table.py:2058
2.302 table_to_dataframe
Column transformations take 55% time of RowIterator.to_dataframe
.
Environment details
- Python version: 3.10.12
pandas-gbq
version: 0.28.0