vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 7.7k
Star 48.6k

Code
Issues 1.9k
Pull requests 695
Discussions
Actions
Projects 11
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: vllm-project/vllm

[RFC]: Deprecating vLLM V0

#18571 opened May 22, 2025 by WoosukKwon

Open 27

[Roadmap] vLLM Roadmap Q2 2025

#15735 opened Mar 29, 2025 by simon-mo

Open 15

[Roadmap] vLLM Release/CI/Performance Benchmark Q2 2025

#16284 opened Apr 8, 2025 by khluu

Open 3

Labels 50 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

69 Open 209 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Performance]: The Unstable Performance Difference between CUDA and PyTorch performance

Performance-related issues

#18884 opened May 29, 2025 by cxn-selfie

1 task done

[Performance]: why the batch-embeddings inputs are separated to small single one? performance

Performance-related issues

#18867 opened May 29, 2025 by xsank

1 task done

[Perf] Tune scaled_fp8_quant by increasing vectorization performance

Performance-related issues

ready

ONLY add when PR is ready to merge/full CI is needed

#18844 opened May 28, 2025 by mgoin

Loading…

[Performance]: How can i improve performance further in vllm lmcache PD Disaggregate？Plz Help Me performance

Performance-related issues

#18801 opened May 28, 2025 by mugglew

1 task done

[Performance]: Falcon H1 7B seems to be significantly slower than Qwen 7B performance

Performance-related issues

#18785 opened May 28, 2025 by justusmattern27

[Performance]: The CPU overhead gradually increases with multiple batches. performance

Performance-related issues

#18760 opened May 27, 2025 by wgzhong

1 task done

[Performance]: yarn degrades the performance of qwen3 performance

Performance-related issues

#18728 opened May 26, 2025 by yanan1116

1 task done

[Performance]: Unexpected: B200 GPU Performance Similar to H200 for Qwen/QwQ-32B, Expected B200 to be Significantly Faster performance

Performance-related issues

#18725 opened May 26, 2025 by awterman

1 task done

Sm100 blockwise fp8 swap ab performance

Performance-related issues

ready

ONLY add when PR is ready to merge/full CI is needed

#18564 opened May 22, 2025 by IwakuraRein

Loading…

[Performance]: Low GPU Utilization (70%) for ViT+Qwen2 VLM Model. performance

Performance-related issues

#18392 opened May 20, 2025 by Oldpan

1 task done

[Performance]: vLLM v0.6.3 to v0.7.1, improved tps by around 1.6x performance

Performance-related issues

#18339 opened May 19, 2025 by wonjerry

1 task done

[Performance]: SGLANG is 4 times faster than vLLM for Qwen/Qwen3-32B-AWQ performance

Performance-related issues

#18136 opened May 14, 2025 by thesillystudent

1 task done

[Performance]: waiting 队列能有多长？和哪些启动参数有关？ performance

Performance-related issues

#17824 opened May 8, 2025 by nvliajia

1 task done

[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE performance

Performance-related issues

#17812 opened May 7, 2025 by ekagra-ranjan

[Performance]: benchmark_serving results for Qwen3-32B vs Qwen2-32B-FP8 are almost the same. performance

Performance-related issues

#17788 opened May 7, 2025 by hit023

1 task done

[Performance]: TPOT and ITL increase as max-num-seqs increases? performance

Performance-related issues

#17598 opened May 2, 2025 by esp-vt

1 task done

[Performance]: Performance comparison for v1 engine and v0 engine performance

Performance-related issues

#17540 opened May 1, 2025 by hustxiayang

1 task done

[Performance]: Quantized Model Inference performance

Performance-related issues

#17487 opened Apr 30, 2025 by sneha5gsm

[Performance]: first token latency during inference is longer when the number of input tokens is small performance

Performance-related issues

#17352 opened Apr 29, 2025 by gaochenxi

1 task done

[Performance]: UVA vs UVM for CPU offloading on v0.8.4+ performance

Performance-related issues

#17062 opened Apr 23, 2025 by rajesh-s

1 task done

[Performance]: Distributed Inference w/ & w/o RDMA over Infiniband (tp=8, pp=2) performance

Performance-related issues

#17006 opened Apr 22, 2025 by surajssd

1 task done

[Performance]: Why/How vLLM uses CPU memory? performance

Performance-related issues

#16947 opened Apr 21, 2025 by khayamgondal

1 task done

[Performance]: fp8 Online Quantization performance

Performance-related issues

#16490 opened Apr 11, 2025 by diggle001

[Performance]: Stability Concerns with LLaMA-4 Models After Extended Uptime (llama-4 models stability on h100 gpus) performance

Performance-related issues

#16473 opened Apr 11, 2025 by nskpro-cmd

1 task done

[Performance]: MultiModalKwargs serialization has significant impact on E2E latency (w/ proof-of-concept patch) performance

Performance-related issues

#16461 opened Apr 11, 2025 by xtknight

1 task done

Previous 1 2 3 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!