Skip to content

whisper : add support for backends with multiple ggml_backend_buffer_type #2863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 26, 2025

Conversation

eddnjjn
Copy link
Contributor

@eddnjjn eddnjjn commented Mar 4, 2025

This patch adds support for backends with multiple ggml_backend_buffer_type to Whisper.cpp. When running on Arm devices, this patch enables the use of the aarch64 and KleidiAI kernels to accelerate matmul operations.

@eddnjjn eddnjjn changed the title whisper : add support for ggml_backend_buffer_type whisper : add support for backends with multiple ggml_backend_buffer_type Mar 10, 2025
@bradmurray-dt
Copy link
Contributor

Is anything additional needed to run this? Do you have any performance comparisons?

@eddnjjn
Copy link
Contributor Author

eddnjjn commented Mar 12, 2025

Is anything additional needed to run this? Do you have any performance comparisons?
Note that this patch mainly targets Arm devices. If your compile and runtime environment is the same (-march=native), you shouldn't have to add any compiler flags as cmake takes care of this. If you cross compile, then you need to specify the target CPU architecture (e.g. -DCMAKE_C_FLAGS=-march=armv8.2a+dotprod+i8mm+sve -DCMAKE_CXX_FLAGS=-march=armv8.2a+dotprod+i8mm+sve).

If you want to run with Arm® KleidiAI™, add -DGGML_CPU_KLEIDIAI=ON to the cmake command line options.

Also, you must quantize the model to Q4_0 as this is the format supported by aarch64 and KleidiAI.

On a Pixel 8 device, this patch gives a 1.44-1.7x performance increase for whisper-bench using the medium.en model. Below you can see the output from whisper-bench for 1-4 threads running on Pixel 8 without and with this patch.

Output from whisper-bench running on Pixel 8

main branch (fc7b1ee)

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 1
whisper_print_timings: load time = 311.55 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 34401.29 ms / 1 runs (34401.29 ms per run)
whisper_print_timings: decode time = 11087.40 ms / 256 runs ( 43.31 ms per run)
whisper_print_timings: batchd time = 13467.27 ms / 320 runs ( 42.09 ms per run)
whisper_print_timings: prompt time = 115861.49 ms / 4096 runs ( 28.29 ms per run)
whisper_print_timings: total time = 174820.05 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 2
whisper_print_timings: load time = 278.75 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 23314.24 ms / 1 runs (23314.24 ms per run)
whisper_print_timings: decode time = 7970.02 ms / 256 runs ( 31.13 ms per run)
whisper_print_timings: batchd time = 8333.07 ms / 320 runs ( 26.04 ms per run)
whisper_print_timings: prompt time = 62768.42 ms / 4096 runs ( 15.32 ms per run)
whisper_print_timings: total time = 102388.12 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 3
whisper_print_timings: load time = 279.81 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 15143.52 ms / 1 runs (15143.52 ms per run)
whisper_print_timings: decode time = 6830.06 ms / 256 runs ( 26.68 ms per run)
whisper_print_timings: batchd time = 6372.15 ms / 320 runs ( 19.91 ms per run)
whisper_print_timings: prompt time = 46688.25 ms / 4096 runs ( 11.40 ms per run)
whisper_print_timings: total time = 75036.06 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 4
whisper_print_timings: load time = 275.25 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 11744.57 ms / 1 runs (11744.57 ms per run)
whisper_print_timings: decode time = 5819.78 ms / 256 runs ( 22.73 ms per run)
whisper_print_timings: batchd time = 5133.26 ms / 320 runs ( 16.04 ms per run)
whisper_print_timings: prompt time = 37920.15 ms / 4096 runs ( 9.26 ms per run)
whisper_print_timings: total time = 60619.95 ms

PR#2863 enabled (running with KleidiAI)

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 1
whisper_print_timings: load time = 397.04 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 19907.53 ms / 1 runs (19907.53 ms per run)
whisper_print_timings: decode time = 7629.50 ms / 256 runs ( 29.80 ms per run)
whisper_print_timings: batchd time = 7538.61 ms / 320 runs ( 23.56 ms per run)
whisper_print_timings: prompt time = 66405.38 ms / 4096 runs ( 16.21 ms per run)
whisper_print_timings: total time = 101483.28 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 2
whisper_print_timings: load time = 393.19 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 14440.07 ms / 1 runs (14440.07 ms per run)
whisper_print_timings: decode time = 5991.29 ms / 256 runs ( 23.40 ms per run)
whisper_print_timings: batchd time = 5250.08 ms / 320 runs ( 16.41 ms per run)
whisper_print_timings: prompt time = 45144.96 ms / 4096 runs ( 11.02 ms per run)
whisper_print_timings: total time = 70828.62 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 3
whisper_print_timings: load time = 400.55 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 9718.67 ms / 1 runs ( 9718.67 ms per run)
whisper_print_timings: decode time = 5209.60 ms / 256 runs ( 20.35 ms per run)
whisper_print_timings: batchd time = 3819.17 ms / 320 runs ( 11.93 ms per run)
whisper_print_timings: prompt time = 32099.34 ms / 4096 runs ( 7.84 ms per run)
whisper_print_timings: total time = 50848.71 ms

LD_LIBRARY_PATH=. ./whisper-bench -m medium-q4_0.bin -t 4
whisper_print_timings: load time = 378.32 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 7698.60 ms / 1 runs ( 7698.60 ms per run)
whisper_print_timings: decode time = 4945.30 ms / 256 runs ( 19.32 ms per run)
whisper_print_timings: batchd time = 3219.06 ms / 320 runs ( 10.06 ms per run)
whisper_print_timings: prompt time = 26154.78 ms / 4096 runs ( 6.39 ms per run)
whisper_print_timings: total time = 42019.84 ms

@ggerganov
Copy link
Member

ggerganov commented Mar 13, 2025

Looks like a good addition, but we need to do some testing and make sure everything works correctly. Testing is a bit tedious atm, because we don't have good CI, so any feedback from the community if this branch works as expected are very welcome.

@eddnjjn
Copy link
Contributor Author

eddnjjn commented Mar 19, 2025

Just want to add that I’ve tested the patch using whisper-cli and whisper-bench in the following environments

Linux x86 (Ubuntu 20.04.6) – CPU (aarch64) backend
Android Pixel 8 – CPU (aarch64, KleidiAI) backend
macOS – Metal and CPU (with/without BLAS, aarch64, KleidiAI) backends

@ggerganov
Copy link
Member

Thanks for the update. I'll do some testing soon on my devices and if everything looks OK, will merge.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some testing on Mac and my Linux box and things appear to be functional. So I think we can proceed to merge this.

Comment on lines 1 to 4
// SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliates <[email protected]>
// SPDX-License-Identifier: MIT
//

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this copyright notice.

@ggerganov ggerganov merged commit 21d890d into ggml-org:master Mar 26, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants