Skip to content

feat: allow loading table from dataframe with extra fields, #1812 #2165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lkhagvadorj-amp
Copy link
Contributor

@lkhagvadorj-amp lkhagvadorj-amp commented Apr 17, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕
#1812

Overview

This pull request introduces a change to the behavior of the dataframe_to_bq_schema function in google.cloud.bigquery._pandas_helpers, allowing extra fields in the bq_schema that are not present in the DataFrame. Instead of raising an error, a warning is issued, and the extra fields are included in the resulting schema. Additionally, a new test is added to validate this behavior.

Changes to dataframe_to_bq_schema behavior:

  • Updated the function to issue a UserWarning instead of raising a ValueError when bq_schema contains fields not found in the DataFrame. These extra fields are now appended to the resulting schema. (google/cloud/bigquery/_pandas_helpers.py, google/cloud/bigquery/_pandas_helpers.pyL540-R551)
  • Added a note to the function's docstring explaining the new behavior regarding extra fields in bq_schema. (google/cloud/bigquery/_pandas_helpers.py, google/cloud/bigquery/_pandas_helpers.pyR487-R490)

Testing updates:

  • Added a new test, test_dataframe_to_bq_schema_allows_extra_fields, to verify that the function correctly handles extra fields in bq_schema by issuing a warning and including the fields in the output schema. (tests/unit/test__pandas_helpers.py, tests/unit/test__pandas_helpers.pyR1388-R1421)

@lkhagvadorj-amp lkhagvadorj-amp requested review from a team as code owners April 17, 2025 15:57
@product-auto-label product-auto-label bot added the size: s Pull request size is small. label Apr 17, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Apr 17, 2025
@chalmerlowe chalmerlowe assigned chalmerlowe and unassigned Neenu1995 Apr 17, 2025
@chalmerlowe chalmerlowe requested review from chalmerlowe and removed request for mrfaizal April 17, 2025 18:43
@chalmerlowe
Copy link
Collaborator

@tswast

This PR updates the dataframe_to_bq_schema(dataframe, bq_schema) function in google/cloud/bigquery/_pandas_helpers.py and swaps out an Error for a Warning if there is a schema mismatch between the dataframe and the bigquery schema.

The docstring for this function has the following note:

DEPRECATED: Use pandas_gbq.schema.pandas_to_bigquery.dataframe_to_bigquery_fields(), instead.

The code we are modifying here is duplicated in pandas-gbq. I think adding this change to pandas-gbq is reasonable and I can issue a PR there.

Is the expectation that at some point we will remove this deprecated function entirely from this repo? If so, what time frame are we considering? Does it make sense to incorporate this PR in this repo?

@tswast
Copy link
Contributor

tswast commented May 30, 2025

The code we are modifying here is duplicated in pandas-gbq. I think adding this change to pandas-gbq is reasonable and I can issue a PR there.

Thanks @chalmerlowe Yes, please contribute this logic to pandas-gbq.

Note: only half of this module is currently pulled in from pandas-gbq. We have pandas -> BigQuery implemented https://github.com/googleapis/python-bigquery-pandas/tree/main/pandas_gbq/schema but not the inverse. That said, this PR is currently in that covered half.

Is the expectation that at some point we will remove this deprecated function entirely from this repo?

Yes. It is already issuing a deprecation warning as of this PR: #2095

If so, what time frame are we considering?

Safest would be whenever we do a 3.0 release. I think there's a few deprecations queued up, right? That said, this is kind of a gray area, since the only "breaking" change is the addition of pandas-gbq to use the pandas-related functionality.

Does it make sense to incorporate this PR in this repo?

I would not. This PR is currently in covered half of https://github.com/googleapis/python-bigquery-pandas/tree/main/pandas_gbq/schema.

@lkhagvadorj-amp
Copy link
Contributor Author

Thank you guys for the clarification, @chalmerlowe and @tswast!

I understand now that this logic belongs in the python-bigquery-pandas repo, specifically under the pandas_gbq.schema module, which is the maintained source for this functionality. Given the deprecation status of dataframe_to_bq_schema in _pandas_helpers, it makes sense not to expand it further here.

Appreciate the guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants