-
Notifications
You must be signed in to change notification settings - Fork 12k
Issues: ggml-org/llama.cpp
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
kv-cache : improve defrag logic
enhancement
New feature or request
performance
Speed related topics
roadmap
Part of a roadmap project
#13497
opened May 13, 2025 by
ggerganov
convert : write tensors in parallel
performance
Speed related topics
python
python script changes
#12837
opened Apr 8, 2025 by
compilade
Loading…
3 of 6 tasks
ggml-cuda : add TQ2_0 kernels, for ternary inference on GPU
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
enhancement
New feature or request
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
performance
Speed related topics
python
python script changes
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
testing
Everything test related
#11183
opened Jan 10, 2025 by
compilade
Loading…
Introduce ggml_syncthreads()
performance
Speed related topics
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#7455
opened May 22, 2024 by
jart
Loading…
sched : support async weight copy
performance
Speed related topics
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
cuda : use amd wave sharing intrinsics for warp_reduce functions
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6522
opened Apr 7, 2024 by
Engininja2
Loading…
Smooth Sampling / Quadratic Sampling support
generation quality
Quality of model output
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6445
opened Apr 2, 2024 by
kalomaze
Loading…
Xeon Phi (Knights Corner) Support.
enhancement
New feature or request
ggml
changes relating to the ggml tensor library for machine learning
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#6440
opened Apr 2, 2024 by
julialongtin
Loading…
Fuse matrix multiplication + SiLU
performance
Speed related topics
refactoring
Refactoring
Review Complexity : Medium
Generally require more time to grok but manageable by beginner to medium expertise level
#5413
opened Feb 8, 2024 by
JohannesGaessler
•
Draft
llama : speed-up grammar sampling
performance
Speed related topics
refactoring
Refactoring
roadmap
Part of a roadmap project
#4218
opened Nov 25, 2023 by
ggerganov
metal : compile-time kernel args and params
performance
Speed related topics
research 🔬
roadmap
Part of a roadmap project
#4085
opened Nov 15, 2023 by
ggerganov
metal: template for mat-vec multiplication kernels
performance
Speed related topics
#2891
opened Aug 30, 2023 by
lshzh-ww
Loading…
cuda: 1.2x faster dequantization kernel
performance
Speed related topics
Review Complexity : High
Generally require indepth knowledge of LLMs or GPUs
#2809
opened Aug 26, 2023 by
li-plus
Loading…
Support CoreML like whisper.cpp?
help wanted
Extra attention is needed
macos
Issues specific to macOS
performance
Speed related topics
#1714
opened Jun 6, 2023 by
realcarlos
ProTip!
Add no:assignee to see everything that’s not assigned.