[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE

We have been doing perf bench on MTBench so that e2e speedup and AL are comparable with other setups and academic papers. Thanks to @luyuzhe111  and others for the discussion and helping with measuring the gaps!

## llama 3 8b
During model wt loading
* https://github.com/vllm-project/vllm/pull/16035#issuecomment-2790985075 
    * SGL correction: https://github.com/vllm-project/vllm/pull/16035#issuecomment-2803265232
    * SGL setup: https://docs.google.com/document/d/18ETJLsnxR88Qq3VDk5Mq-Hb7vuE9o3VNZ-hhz-OqAXk/edit?usp=sharing
* https://github.com/vllm-project/vllm/pull/16035#issuecomment-2791642158

During KV Cache slot
* https://github.com/vllm-project/vllm/pull/16370#issuecomment-2802319168

## llama 3.1 8b
* https://github.com/vllm-project/vllm/pull/16370#issuecomment-2810446121
* EAGLE - 1/3 
  * offline serving: https://github.com/vllm-project/vllm/pull/16937#issuecomment-2828889840
  * online serving: https://github.com/vllm-project/vllm/pull/17202#issue-3020819496

torch compile & CUDA graph:
* EAGLE 1 
  * https://github.com/vllm-project/vllm/pull/17211#issuecomment-2837880435
  * https://github.com/vllm-project/vllm/pull/17211#issuecomment-2840374449
* EAGLE 3
  *  https://github.com/vllm-project/vllm/pull/17504#issuecomment-2845707371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE #17812

llama 3 8b

llama 3.1 8b

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Benchmark][V1][Spec Decode][EAGLE] Tracking benchmark for V1 EAGLE #17812

Description

llama 3 8b

llama 3.1 8b

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions