[Performance]: yarn degrades the performance of qwen3

### Proposal to improve performance

`vllm version == 0.8.5.post1`

without yarn
```bash
vllm serve Qwen/Qwen3-32B   \
 --trust-remote-code --gpu_memory_utilization 0.95 --tensor-parallel-size 2 \
--quantization bitsandbytes --load_format bitsandbytes --enforce_eager \
--max-model-len 32768
```

with yarn
```bash
vllm serve Qwen/Qwen3-32B   \
--trust-remote-code --gpu_memory_utilization 0.95 --tensor-parallel-size 2 \
--quantization bitsandbytes --load_format bitsandbytes --enforce_eager \
--rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
--max-model-len 131072
```

I have some tests on my end for its agentic capabilities based on qwen3 and I have some solid findings that enabling yarn to extend window context does degrade the performace, with around 15-20% performance drop. 

do u also encounter the same findings ? any suggestion about this drop ?



### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: yarn degrades the performance of qwen3 #18728

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: yarn degrades the performance of qwen3 #18728

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions