-
Notifications
You must be signed in to change notification settings - Fork 4.2k
whisper : add support for backends with multiple ggml_backend_buffer_type #2863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whisper : add support for backends with multiple ggml_backend_buffer_type #2863
Conversation
Signed-off-by: Dan Johansson <[email protected]>
Signed-off-by: Dan Johansson <[email protected]>
Is anything additional needed to run this? Do you have any performance comparisons? |
If you want to run with Arm® KleidiAI™, add -DGGML_CPU_KLEIDIAI=ON to the cmake command line options. Also, you must quantize the model to Q4_0 as this is the format supported by aarch64 and KleidiAI. On a Pixel 8 device, this patch gives a 1.44-1.7x performance increase for whisper-bench using the medium.en model. Below you can see the output from whisper-bench for 1-4 threads running on Pixel 8 without and with this patch. Output from whisper-bench running on Pixel 8main branch (fc7b1ee)LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 1 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 2 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 3 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 4 PR#2863 enabled (running with KleidiAI)LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 1 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 2 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 3 LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 4 |
Looks like a good addition, but we need to do some testing and make sure everything works correctly. Testing is a bit tedious atm, because we don't have good CI, so any feedback from the community if this branch works as expected are very welcome. |
Just want to add that I’ve tested the patch using whisper-cli and whisper-bench in the following environments Linux x86 (Ubuntu 20.04.6) – CPU (aarch64) backend |
Thanks for the update. I'll do some testing soon on my devices and if everything looks OK, will merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done some testing on Mac and my Linux box and things appear to be functional. So I think we can proceed to merge this.
src/whisper-arch.h
Outdated
// SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliates <[email protected]> | ||
// SPDX-License-Identifier: MIT | ||
// | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this copyright notice.
Signed-off-by: Dan Johansson <[email protected]>
This patch adds support for backends with multiple ggml_backend_buffer_type to Whisper.cpp. When running on Arm devices, this patch enables the use of the aarch64 and KleidiAI kernels to accelerate matmul operations.