Skip to content

Writing to CSV buffer with default float format causes numerical overflow in BigQuery #192

Closed
@anthonydelage

Description

@anthonydelage

I'm using to_gbq() to load a local DataFrame into BigQuery. I'm running into an issue where floating point numbers are gaining significant figures and therefore causing numerical overflow errors when loaded to BigQuery.

The load.py module's encode_chunk() function writes to a local CSV buffer using Pandas' to_csv() function, which has a known issue regarding added significant figures on some operating systems (read more here).

In my case, 0.208 was transformed to 0.20800000000000002.

I've been able to solve the issue locally by changing the float_format parameter to '%g' in the encode_chunk() function's pd.to_csv() call:

dataframe.to_csv(
    csv_buffer, index=False, header=False, encoding='utf-8',
    float_format='%g', date_format='%Y-%m-%d %H:%M:%S.%f')

Can this be safely applied as a default?

Versions:

pandas==0.22.0
pandas-gbq==0.5.0

OS details:

MacOS 10.13.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions