Open
Description
Project: ggml-org : tutorials
List:
- tutorial : compute embeddings using llama.cpp
- tutorial : parallel inference using Hugging Face dedicated endpoints
- tutorial : KV cache reuse with llama-server
TODO:
- Is there a way to cache multiple prompt prefixes? #13488
- How to use function calls? #13134
- how to measure time to first token (TTFT) and time between tokens (TBT) #13251
- Apple A-chipsets; how to estimate a suitable model size ? #12742
- How to get started with webui development (ref: tutorials : list for llama.cpp #13523 (comment))
- etc.
Simply search for "How to" in the Discussions: https://github.com/ggml-org/llama.cpp/discussions?discussions_q=is%3Aopen+How+to
Contributions for writing tutorials are welcome!
Metadata
Metadata
Assignees
Type
Projects
Status
In Progress