Skip to content

Client.load_table_from_dataframe() sometimes chooses invalid column type #1650

Closed
@willsthompson

Description

@willsthompson

Environment details

  • OS type and version: macOS 13
  • Python version: Python 3.10.7
  • pip version: pip 23.2
  • google-cloud-bigquery version: 3.11.4

Steps to reproduce

  1. Create a table with a BIGNUMERIC type column
  2. Use Client.load_table_from_dataframe() to load a column with the value Decimal("0.12345678901234560000000000000000000000")

Code example

df = pd.DataFrame("TEST_COL", Decimal("0.12345678901234560000000000000000000000"))
with bigquery.Client(project=project, credentials=credentials) as client:
    load_job = client.load_table_from_dataframe(df, table_name)
    result = load_job.result()

Stack trace

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Rescaling Decimal128 value would cause data loss

Details

I poked around a bit, and I think I see the problem. To determine the column type you create an arrow table

arrow_table = pyarrow.array(dataframe.reset_index()[field.name])
then lookup the type by type ID
detected_type = ARROW_SCALAR_IDS_TO_BQ.get(arrow_table.type.id)

My test decimal value can be represented by an arrow 128-bit type

>>>pyarrow.array(pd.DataFrame({"x": [Decimal('0.12345678901234560000000000000000000000')]})["x"]).type
Decimal128Type(decimal128(38, 38))

But AFAICT that has the same type ID as every other Decimal128Type, so it maps to "NUMERIC" in your lookup table:

pyarrow.decimal128(38, scale=9).id: "NUMERIC",

So when you convert to pyarrow before uploading, it tries to coerce using your hardcoded type for "NUMERIC"
return pyarrow.Array.from_pandas(series, type=arrow_type)

and fails because it needed decimal128(38, 38) to represent the value, but is given decimal128(38, 9). I think you will need to detect if a 128 bit type can't fit in your NUMERIC subtype and upgrade those to BIGNUMERIC.

Hope that helps!

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/python-bigquery API.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions