Windowing heuristic #161

tom-huntington · 2022-11-20T22:02:11Z

tom-huntington
Nov 20, 2022

Seems like you just stride by the window length to produce the segments
https://github.com/ggerganov/whisper.cpp/blob/2065572a11fca8c31adbb7d00c6518b290099445/whisper.cpp#L2917-L2918

Seems like this wont handle words split across segments very well.

ggerganov · 2022-11-21T15:54:49Z

ggerganov
Nov 21, 2022
Maintainer

The code that you have quoted is related to what I call "processors". This is a functionality that was requested by someone to split the audio into chunks and process the chunks separately using a single model in memory. The hope was that there will be benefit from this approach on multi-core server machines. See the following PR for more info: #110

The actual sliding window logic that you are referring to is implemented here:

https://github.com/ggerganov/whisper.cpp/blob/eab36eb63c658ee58e159e79fbf5da9ccabdcfc8/whisper.cpp#L2670-L2723

Basically, we sample the best token, and when the token is a timestamp, we remember it in seek_delta in order to slide the window by that amount.

1 reply

tom-huntington Nov 21, 2022
Author

Wow, so your ggml library actually uses multiple cores on the same forward pass. Rather just than running multiple forward passes parallel.

563 sec to transcribe 1h 30m of audio, I just though you must be doing segments in parallel.

openai/whisper#208 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Windowing heuristic #161

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Windowing heuristic #161

Uh oh!

Uh oh!

tom-huntington Nov 20, 2022

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

ggerganov Nov 21, 2022 Maintainer

Uh oh!

Uh oh!

tom-huntington Nov 21, 2022 Author

tom-huntington
Nov 20, 2022

Replies: 1 comment 1 reply

ggerganov
Nov 21, 2022
Maintainer

tom-huntington Nov 21, 2022
Author