Add Moore Threads GPU support and update GitHub workflow for MUSA build #3069

yeahdongcn · 2025-04-24T01:50:07Z

This PR includes the following updates:

Adds Moore Threads GPU support to README.md, and replaces usage of ./main with whisper-cli.
Enables building of ghcr.io/ggml-org/whisper.cpp:main-musa via GitHub workflow for MUSA support.
Updates the PATH environment variable in both main and main-cuda Dockerfiles for improved CLI usability.
Enhances the Makefile to forward GGML_CUDA and GGML_MUSA environment variables to cmake.

(Previously, setting export GGML_CUDA=1 on the host had no effect during the build. This change ensures these flags are now respected.)

Testing Done

# host
## cmake
cmake -B build -DGGML_MUSA=1 -DMUSA_ARCHITECTURES="21"
cmake --build build -j --config Release

## make
make base.en CMAKE_ARGS="-DGGML_MUSA=1"

# container
docker build -t whisper.cpp:main-musa -f ./.devops/main-musa.Dockerfile .
docker run -it --rm -v $PWD/samples:/audios -v $PWD/models/:/models whisper.cpp:main-musa "whisper-cli -m /models/ggml-base.en.bin -f ./samples/jfk.wav"
whisper_init_from_file_with_params_no_state: loading model from '/models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 MUSA devices:
  Device 0: MTT S80, compute capability 2.1, VMM: yes
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:        MUSA0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using MUSA0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.22 MB
whisper_init_state: compute buffer (encode) =   85.86 MB
whisper_init_state: compute buffer (cross)  =    4.65 MB
whisper_init_state: compute buffer (decode) =   97.27 MB

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | MUSA : PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   100.09 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     7.57 ms
whisper_print_timings:   sample time =    47.42 ms /   133 runs (     0.36 ms per run)
whisper_print_timings:   encode time =   121.28 ms /     1 runs (   121.28 ms per run)
whisper_print_timings:   decode time =    68.62 ms /     3 runs (    22.87 ms per run)
whisper_print_timings:   batchd time =  1870.63 ms /   126 runs (    14.85 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (     0.00 ms per run)
whisper_print_timings:    total time =  2222.29 ms

Signed-off-by: Xiaodong Ye <[email protected]>

…isper-cli Signed-off-by: Xiaodong Ye <[email protected]>

Signed-off-by: Xiaodong Ye <[email protected]>

Makefile

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn · 2025-04-28T06:00:01Z

@ggerganov @danbev Do you have any other concerns? Can we merge this?

danbev · 2025-04-28T06:20:20Z

@ggerganov @danbev Do you have any other concerns? Can we merge this?

All good from my side 👍

yeahdongcn added 5 commits April 24, 2025 09:30

Update PATH for main/main-cuda container

4e5e791

Signed-off-by: Xiaodong Ye <[email protected]>

Add Dockerfile for musa, .dockerignore and update CI

eb6f702

Signed-off-by: Xiaodong Ye <[email protected]>

Add Moore Threads GPU Support in README.md and replace ./main with wh…

5338385

…isper-cli Signed-off-by: Xiaodong Ye <[email protected]>

Forward GGML_CUDA/GGML_MUSA to cmake in Makefile

01c4363

Signed-off-by: Xiaodong Ye <[email protected]>

Minor updates for PATH ENV in Dockerfiles

ffc5d00

Signed-off-by: Xiaodong Ye <[email protected]>

danbev reviewed Apr 24, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

Address comments

c6eb3bc

Signed-off-by: Xiaodong Ye <[email protected]>

danbev approved these changes Apr 24, 2025

View reviewed changes

yeahdongcn requested a review from ggerganov April 24, 2025 10:32

ggerganov approved these changes Apr 28, 2025

View reviewed changes

ggerganov merged commit 50218b9 into ggml-org:master Apr 28, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Moore Threads GPU support and update GitHub workflow for MUSA build #3069

Add Moore Threads GPU support and update GitHub workflow for MUSA build #3069

yeahdongcn commented Apr 24, 2025 •

edited

Loading

yeahdongcn commented Apr 28, 2025

danbev commented Apr 28, 2025

Add Moore Threads GPU support and update GitHub workflow for MUSA build #3069

Add Moore Threads GPU support and update GitHub workflow for MUSA build #3069

Conversation

yeahdongcn commented Apr 24, 2025 • edited Loading

Testing Done

yeahdongcn commented Apr 28, 2025

danbev commented Apr 28, 2025

yeahdongcn commented Apr 24, 2025 •

edited

Loading