-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Performance]: The Unstable Performance Difference between CUDA and PyTorch
performance
Performance-related issues
#18884
opened May 29, 2025 by
cxn-selfie
1 task done
[Performance]: why the batch-embeddings inputs are separated to small single one?
performance
Performance-related issues
#18867
opened May 29, 2025 by
xsank
1 task done
[Perf] Tune Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
scaled_fp8_quant
by increasing vectorization
performance
#18844
opened May 28, 2025 by
mgoin
Loading…
[Performance]: How can i improve performance further in vllm lmcache PD Disaggregate?Plz Help Me
performance
Performance-related issues
#18801
opened May 28, 2025 by
mugglew
1 task done
[Performance]: Falcon H1 7B seems to be significantly slower than Qwen 7B
performance
Performance-related issues
#18785
opened May 28, 2025 by
justusmattern27
[Performance]: The CPU overhead gradually increases with multiple batches.
performance
Performance-related issues
#18760
opened May 27, 2025 by
wgzhong
1 task done
[Performance]: yarn degrades the performance of qwen3
performance
Performance-related issues
#18728
opened May 26, 2025 by
yanan1116
1 task done
[Performance]: Unexpected: B200 GPU Performance Similar to H200 for Qwen/QwQ-32B, Expected B200 to be Significantly Faster
performance
Performance-related issues
#18725
opened May 26, 2025 by
awterman
1 task done
Sm100 blockwise fp8 swap ab
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
#18564
opened May 22, 2025 by
IwakuraRein
Loading…
[Performance]: Low GPU Utilization (70%) for ViT+Qwen2 VLM Model.
performance
Performance-related issues
#18392
opened May 20, 2025 by
Oldpan
1 task done
[Performance]: vLLM v0.6.3 to v0.7.1, improved tps by around 1.6x
performance
Performance-related issues
#18339
opened May 19, 2025 by
wonjerry
1 task done
[Performance]: SGLANG is 4 times faster than vLLM for Qwen/Qwen3-32B-AWQ
performance
Performance-related issues
#18136
opened May 14, 2025 by
thesillystudent
1 task done
[Performance]: waiting 队列能有多长?和哪些启动参数有关?
performance
Performance-related issues
#17824
opened May 8, 2025 by
nvliajia
1 task done
[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE
performance
Performance-related issues
#17812
opened May 7, 2025 by
ekagra-ranjan
[Performance]: benchmark_serving results for Qwen3-32B vs Qwen2-32B-FP8 are almost the same.
performance
Performance-related issues
#17788
opened May 7, 2025 by
hit023
1 task done
[Performance]: TPOT and ITL increase as Performance-related issues
max-num-seqs
increases?
performance
#17598
opened May 2, 2025 by
esp-vt
1 task done
[Performance]: Performance comparison for v1 engine and v0 engine
performance
Performance-related issues
#17540
opened May 1, 2025 by
hustxiayang
1 task done
[Performance]: Quantized Model Inference
performance
Performance-related issues
#17487
opened Apr 30, 2025 by
sneha5gsm
[Performance]: first token latency during inference is longer when the number of input tokens is small
performance
Performance-related issues
#17352
opened Apr 29, 2025 by
gaochenxi
1 task done
[Performance]: UVA vs UVM for CPU offloading on v0.8.4+
performance
Performance-related issues
#17062
opened Apr 23, 2025 by
rajesh-s
1 task done
[Performance]: Distributed Inference w/ & w/o RDMA over Infiniband (tp=8, pp=2)
performance
Performance-related issues
#17006
opened Apr 22, 2025 by
surajssd
1 task done
[Performance]: Why/How vLLM uses CPU memory?
performance
Performance-related issues
#16947
opened Apr 21, 2025 by
khayamgondal
1 task done
[Performance]: fp8 Online Quantization
performance
Performance-related issues
#16490
opened Apr 11, 2025 by
diggle001
[Performance]: Stability Concerns with LLaMA-4 Models After Extended Uptime (llama-4 models stability on h100 gpus)
performance
Performance-related issues
#16473
opened Apr 11, 2025 by
nskpro-cmd
1 task done
[Performance]: MultiModalKwargs serialization has significant impact on E2E latency (w/ proof-of-concept patch)
performance
Performance-related issues
#16461
opened Apr 11, 2025 by
xtknight
1 task done
Previous Next
ProTip!
no:milestone will show everything without a milestone.