Windowing heuristic #161
tom-huntington
started this conversation in
General
Replies: 1 comment 1 reply
-
The code that you have quoted is related to what I call "processors". This is a functionality that was requested by someone to split the audio into chunks and process the chunks separately using a single model in memory. The hope was that there will be benefit from this approach on multi-core server machines. See the following PR for more info: #110 The actual sliding window logic that you are referring to is implemented here: Basically, we sample the best token, and when the token is a timestamp, we remember it in |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Seems like you just stride by the window length to produce the segments
https://github.com/ggerganov/whisper.cpp/blob/2065572a11fca8c31adbb7d00c6518b290099445/whisper.cpp#L2917-L2918
Seems like this wont handle words split across segments very well.
Beta Was this translation helpful? Give feedback.
All reactions