Transcribing almost 2000 audio files on a laptop iGPU #2806

toazd · 2025-02-10T21:14:21Z

toazd
Feb 10, 2025

This post is mainly a thank you to you @ggerganov for making this project. I'm using a Lenovo Yoga6 (AMD Ryzen 7 7730U 2GBVRAM/16GB total RAM) to transcribe almost 2,000 audio files (the best machine available to me atm) and thanks to your project supporting vulkan I can transcribe the files almost twice as fast verses using only the CPU (openai-whisper). The CPU on this machine is very respectable but the GPU is significantly faster for LLM/inference work. Not to mention that since the CPU is barely being used, I can still use the machine to do other things, including running lm-studio (in cpu only mode, even though it perfectly supports vulkan as well) with 15k context to test different 7B/8B models to see which provides the best summaries of the transcriptions :)

The script I'm using for the transcription is here:
https://github.com/toazd/LLM_playground/blob/main/transcribe_audio_whisper.cpp.sh

I'm still testing LLMs for transcription summaries but so far Deepseek-r1-8B and Qwen2.5-7B-instruct (both Q4) have done very well.

toazd · 2025-02-11T20:15:22Z

toazd
Feb 11, 2025
Author

Have to do some more testing because with the mostly default settings the larger model "ggml-large-v3-turbo.bin" is outputting nonsense, repeated lines when it encounters "noise" (a medium crowd all talking at the same time). Interestingly, the model "ggml-tiny.en.bin" just outputs "[INAUDIBLE]" which is better. Need to do some more testing.

So far I've adjusted the command line options to include (wish there was more documentation on what these do):
--temperature-inc 0
--no-fallback
--suppress-nst

0 replies

toazd · 2025-02-11T21:06:35Z

toazd
Feb 11, 2025
Author

So, for my use cases the following adjustments look very promising:

Change the model to small.en (it's far superior at recognizing croud noise, people speaking off microphone, inaudible/un-intelligable, and blank audio (instead of hallucinating like large-v3-turbo does).

I can't stress how much better small.en is. It was a welcome surprise.

Command line options added:
--temperature-inc 0
--no-fallback
--max-context 0

Script has been updated.

0 replies

toazd · 2025-02-13T12:44:34Z

toazd
Feb 13, 2025
Author

Some more screenshots of amdgpu_top --gui using small.en and the new settings. It's happily chugging away ~ 1/6th of the way done with the transcribing part.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcribing almost 2000 audio files on a laptop iGPU #2806

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Transcribing almost 2000 audio files on a laptop iGPU #2806

Uh oh!

Uh oh!

toazd Feb 10, 2025

Replies: 3 comments

Uh oh!

Uh oh!

toazd Feb 11, 2025 Author

Uh oh!

Uh oh!

toazd Feb 11, 2025 Author

Uh oh!

toazd Feb 13, 2025 Author

toazd
Feb 10, 2025

toazd
Feb 11, 2025
Author

toazd
Feb 11, 2025
Author

toazd
Feb 13, 2025
Author