Description
Thanks very much for your fantastic work with this library @ggerganov!
Updated to latest llama.cpp (revision: e790eef) this morning.
StableLM-Zephyr-3b quants triggering GGML_ASSERT(n_embd_head == hparams.n_rot);
introduced in #4889 (f445c0e#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5533).
It seems the values I see printed for the quants I have (Q4_K_M) are as follows:
llm_load_print_meta: n_rot = 20
llm_load_print_meta: n_embd_head_k = 80
llm_load_print_meta: n_embd_head_v = 80
You mentioned chance of breaking Persimmon:
All models now will use the
hparams.n_rot
value instead of relying on a custom parameter (liken_embd_head
). Both forggml_rope_custom
andllm_build_k_shift
. I suspect this might break Persimmon inference, because I'm not sure ifhparams.n_rot
is correctly populated in the meta data of the model, but if that is the case, then it should be fixed.
Is this a similar issue to Persimmon where the model metadata is incorrect, or is this a distinct problem?