[Performance]: waiting 队列能有多长？和哪些启动参数有关？

### Proposal to improve performance

我使用八卡4090启动ds32B模型，然后连续请求了1200个请求，但是我看waiting reqs和runing reqs的数量加起来大约是1000个，想知道vllm是如何控制队列长度的，队列长度和哪些启动参数有关？

![Image](https://github.com/user-attachments/assets/dfafb417-f995-420d-bbc9-6853f42b19f5)


我的启动命令是：vllm serve llm_model/ds_32B/ --served-model-name deepseek --api-key 12345 --disable-log-requests --trust-remote-code --tensor-parallel-size 8 --max-model-len 25000 --gpu_memory_utilization 0.7 --max-num-seqs 96 --max-num-batched-tokens 18096

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: waiting 队列能有多长？和哪些启动参数有关？ #17824

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: waiting 队列能有多长？和哪些启动参数有关？ #17824

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions