Skip to content

ci : add coreml job that converts base.en to coreml [no ci] #2981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 1, 2025

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented Apr 1, 2025

This commit adds a new job to the CI pipeline that downloads the base.en model and converts it to CoreML format. The CoreML model is then packed into a zip file and uploaded as an artifact.

This will only be done for pushes to master, releases, or pre-releases.

Refs: #2783


I've run this on my fork and it produces the following release:
https://github.com/danbev/whisper.cpp/releases/tag/b2368

I've only included one model but this will exercise the model conversion scripts and should give early feedback if anything breaks. We could also add more models to be published too.

This commit adds a new job to the CI pipeline that downloads the base.en
model and converts it to CoreML format. The CoreML model is then packed
into a zip file and uploaded as an artifact.

This will only be done for pushes to master, releases, or pre-releases.

Refs: ggml-org#2783
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now one model is OK to test the conversion.

Not sure if there is need to create an artifact - it would result in a lot of data (80MB) that is almost always the same.

So I think it's better to remove the artifact from this CI. We can add an additional ggml-ci that generates a CoreML model and even runs a transcription with it. It can run on the ggml-100-mac-m4 node.

@danbev
Copy link
Collaborator Author

danbev commented Apr 1, 2025

So I think it's better to remove the artifact from this CI.

Just wanted to make sure that I'm not misunderstanding anything. Should we not include this job at all or should we just skip the publishing of the artifact?

@ggerganov
Copy link
Member

Just skip the publishing of the artifact. The job is good to have.

@danbev
Copy link
Collaborator Author

danbev commented Apr 1, 2025

Running the CI for this as I've modified the bulid.yml. I think the build it taking a little longer than before because the test have been enabled. I did not consider this when I enabled them. I'll take a closer look though.

Only the tests labeled with gh will run in CI which currently are only test-whisper-cli-tiny and test-whisper-cli-tiny.en:

set(TEST_TARGET test-whisper-cli-tiny)
add_test(NAME ${TEST_TARGET}
    COMMAND $<TARGET_FILE:whisper-cli>
    -m ${PROJECT_SOURCE_DIR}/models/for-tests-ggml-tiny.bin -l fr
    -f ${PROJECT_SOURCE_DIR}/samples/jfk.wav)
set_tests_properties(${TEST_TARGET} PROPERTIES LABELS "tiny;gh")

set(TEST_TARGET test-whisper-cli-tiny.en)
add_test(NAME ${TEST_TARGET}
    COMMAND $<TARGET_FILE:whisper-cli>
    -m ${PROJECT_SOURCE_DIR}/models/for-tests-ggml-tiny.en.bin
    -f ${PROJECT_SOURCE_DIR}/samples/jfk.wav)
set_tests_properties(${TEST_TARGET} PROPERTIES LABELS "tiny;en;gh")

But this can still take a long time, for example ubuntu-22-gcc (Debug, linux/ppc64le):

Test project /workspace
    Start 1: test-whisper-cli-tiny
1/2 Test #1: test-whisper-cli-tiny ............   Passed  784.02 sec
    Start 2: test-whisper-cli-tiny.en
2/2 Test #2: test-whisper-cli-tiny.en .........   Passed  763.74 sec
100% tests passed, 0 tests failed out of 2
Label Time Summary:
en      = 763.74 sec*proc (1 test)
gh      = 1547.75 sec*proc (2 tests)
tiny    = 1547.75 sec*proc (2 tests)
Total Test time (real) = 1547.80 sec

@ggerganov
Copy link
Member

ggerganov commented Apr 1, 2025

Btw, to fix the thread sanitizer warnings here: https://github.com/ggerganov/whisper.cpp/actions/runs/14195883603/job/39770792600?pr=2979, we have to build with GGML_OPENMP=OFF.

Here is llama.cpp config:

https://github.com/ggml-org/llama.cpp/blob/3fd072a54001a908c54e81fd2e82b682ecfdd475/.github/workflows/build.yml#L295-L306

The reason is that OpenMP threads are causing false positives when the thread sanitizer is enabled, so we simply disable them for sanitizer tests.

@danbev danbev merged commit 04b9508 into ggml-org:master Apr 1, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants