Skip to content

feat: add blob.transcribe function #1773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

shuoweil
Copy link
Contributor

@shuoweil shuoweil commented May 26, 2025

add blob.transcribe function

blob.transcribe reads in GCS link as input, transcribe audio file using ML.GENERATE_TEXT
The test audio is downloaded from LJ Speech Dataset. Quote of license is "This dataset is in the public domain in the US (and most likely other countries as well). There are no restrictions on its use." The audio file is download from this link.

b/415046758

@shuoweil shuoweil requested a review from GarrettWu May 26, 2025 18:34
@shuoweil shuoweil self-assigned this May 26, 2025
@shuoweil shuoweil requested review from a team as code owners May 26, 2025 18:34
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 26, 2025
@GarrettWu
Copy link
Contributor

It's a feat instead of chore

@shuoweil shuoweil changed the title chore: add blob.transcribe function feat: add blob.transcribe function May 27, 2025
@shuoweil
Copy link
Contributor Author

blocked by b/421006257

@GarrettWu GarrettWu added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 29, 2025
@shuoweil shuoweil added do not merge Indicates a pull request not ready for merge, due to either quality or timing. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. status: not ready for review labels May 29, 2025
@GarrettWu
Copy link
Contributor

It is not good to merge until model quality issue is fixed.

@shuoweil shuoweil force-pushed the shuowei-transcription branch from 709edb6 to e2ce3ec Compare May 29, 2025 23:00
else:
actual_text = actual[0]
actual_len = len(actual_text)
print(actual_text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print.

@shuoweil shuoweil force-pushed the shuowei-transcription branch from e2ce3ec to 03c3fa6 Compare June 3, 2025 00:58
@shuoweil shuoweil removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants