Skip to content

Issues: vllm-project/vllm

[RFC]: Deprecating vLLM V0
#18571 opened May 22, 2025 by WoosukKwon
Open 27
[Roadmap] vLLM Roadmap Q2 2025
#15735 opened Mar 29, 2025 by simon-mo
Open 15
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Issues list

[Performance]: The Unstable Performance Difference between CUDA and PyTorch performance Performance-related issues
#18884 opened May 29, 2025 by cxn-selfie
1 task done
[Performance]: why the batch-embeddings inputs are separated to small single one? performance Performance-related issues
#18867 opened May 29, 2025 by xsank
1 task done
[Perf] Tune scaled_fp8_quant by increasing vectorization performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed
#18844 opened May 28, 2025 by mgoin Loading…
[Performance]: The CPU overhead gradually increases with multiple batches. performance Performance-related issues
#18760 opened May 27, 2025 by wgzhong
1 task done
[Performance]: yarn degrades the performance of qwen3 performance Performance-related issues
#18728 opened May 26, 2025 by yanan1116
1 task done
Sm100 blockwise fp8 swap ab performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed
#18564 opened May 22, 2025 by IwakuraRein Loading…
[Performance]: Low GPU Utilization (70%) for ViT+Qwen2 VLM Model. performance Performance-related issues
#18392 opened May 20, 2025 by Oldpan
1 task done
[Performance]: vLLM v0.6.3 to v0.7.1, improved tps by around 1.6x performance Performance-related issues
#18339 opened May 19, 2025 by wonjerry
1 task done
[Performance]: SGLANG is 4 times faster than vLLM for Qwen/Qwen3-32B-AWQ performance Performance-related issues
#18136 opened May 14, 2025 by thesillystudent
1 task done
[Performance]: waiting 队列能有多长?和哪些启动参数有关? performance Performance-related issues
#17824 opened May 8, 2025 by nvliajia
1 task done
[Performance]: TPOT and ITL increase as max-num-seqs increases? performance Performance-related issues
#17598 opened May 2, 2025 by esp-vt
1 task done
[Performance]: Performance comparison for v1 engine and v0 engine performance Performance-related issues
#17540 opened May 1, 2025 by hustxiayang
1 task done
[Performance]: Quantized Model Inference performance Performance-related issues
#17487 opened Apr 30, 2025 by sneha5gsm
[Performance]: UVA vs UVM for CPU offloading on v0.8.4+ performance Performance-related issues
#17062 opened Apr 23, 2025 by rajesh-s
1 task done
[Performance]: Distributed Inference w/ & w/o RDMA over Infiniband (tp=8, pp=2) performance Performance-related issues
#17006 opened Apr 22, 2025 by surajssd
1 task done
[Performance]: Why/How vLLM uses CPU memory? performance Performance-related issues
#16947 opened Apr 21, 2025 by khayamgondal
1 task done
[Performance]: fp8 Online Quantization performance Performance-related issues
#16490 opened Apr 11, 2025 by diggle001
ProTip! no:milestone will show everything without a milestone.