-
Notifications
You must be signed in to change notification settings - Fork 4.3k
ci : add coreml job that converts base.en to coreml [no ci] #2981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit adds a new job to the CI pipeline that downloads the base.en model and converts it to CoreML format. The CoreML model is then packed into a zip file and uploaded as an artifact. This will only be done for pushes to master, releases, or pre-releases. Refs: ggml-org#2783
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now one model is OK to test the conversion.
Not sure if there is need to create an artifact - it would result in a lot of data (80MB) that is almost always the same.
So I think it's better to remove the artifact from this CI. We can add an additional ggml-ci
that generates a CoreML model and even runs a transcription with it. It can run on the ggml-100-mac-m4
node.
Just wanted to make sure that I'm not misunderstanding anything. Should we not include this job at all or should we just skip the publishing of the artifact? |
Just skip the publishing of the artifact. The job is good to have. |
Running the CI for this as I've modified the bulid.yml. I think the build it taking a little longer than before because the test have been enabled. I did not consider this when I enabled them. I'll take a closer look though. Only the tests labeled with set(TEST_TARGET test-whisper-cli-tiny)
add_test(NAME ${TEST_TARGET}
COMMAND $<TARGET_FILE:whisper-cli>
-m ${PROJECT_SOURCE_DIR}/models/for-tests-ggml-tiny.bin -l fr
-f ${PROJECT_SOURCE_DIR}/samples/jfk.wav)
set_tests_properties(${TEST_TARGET} PROPERTIES LABELS "tiny;gh")
set(TEST_TARGET test-whisper-cli-tiny.en)
add_test(NAME ${TEST_TARGET}
COMMAND $<TARGET_FILE:whisper-cli>
-m ${PROJECT_SOURCE_DIR}/models/for-tests-ggml-tiny.en.bin
-f ${PROJECT_SOURCE_DIR}/samples/jfk.wav)
set_tests_properties(${TEST_TARGET} PROPERTIES LABELS "tiny;en;gh") But this can still take a long time, for example ubuntu-22-gcc (Debug, linux/ppc64le): Test project /workspace
Start 1: test-whisper-cli-tiny
1/2 Test #1: test-whisper-cli-tiny ............ Passed 784.02 sec
Start 2: test-whisper-cli-tiny.en
2/2 Test #2: test-whisper-cli-tiny.en ......... Passed 763.74 sec
100% tests passed, 0 tests failed out of 2
Label Time Summary:
en = 763.74 sec*proc (1 test)
gh = 1547.75 sec*proc (2 tests)
tiny = 1547.75 sec*proc (2 tests)
Total Test time (real) = 1547.80 sec |
Btw, to fix the thread sanitizer warnings here: https://github.com/ggerganov/whisper.cpp/actions/runs/14195883603/job/39770792600?pr=2979, we have to build with Here is The reason is that OpenMP threads are causing false positives when the thread sanitizer is enabled, so we simply disable them for sanitizer tests. |
This commit adds a new job to the CI pipeline that downloads the base.en model and converts it to CoreML format. The CoreML model is then packed into a zip file and uploaded as an artifact.
This will only be done for pushes to master, releases, or pre-releases.
Refs: #2783
I've run this on my fork and it produces the following release:
https://github.com/danbev/whisper.cpp/releases/tag/b2368
I've only included one model but this will exercise the model conversion scripts and should give early feedback if anything breaks. We could also add more models to be published too.