[Bug]: nightly version: ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers'

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

```
INFO 04-27 14:39:59 [config.py:3574] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 1
68, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448
, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 2
96, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 1
04, 496, 368, 240, 112, 376]                                                                                                                                                     
INFO 04-27 14:39:59 [weight_utils.py:265] Using model weights format ['*.safetensors']                                                                                           
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]                                                                                                     
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.55s/it]                                                                                             
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.69s/it]                                                                                             
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.67s/it]                                                                                             
                                                                                                                                                                                 
INFO 04-27 14:40:03 [loader.py:458] Loading weights took 3.43 seconds                                                                                                            
INFO 04-27 14:40:03 [gpu_model_runner.py:1339] Model loading took 7.1557 GiB and 4.703990 seconds                                                                                
INFO 04-27 14:43:48 [gpu_model_runner.py:1612] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.     
ERROR 04-27 14:43:54 [core.py:396] EngineCore failed to start.                                                                                                                   
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):                                                                                                            
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1275, in IMPORT_NAME          
ERROR 04-27 14:43:54 [core.py:396]     value = __import__(                                                                                                                       
ERROR 04-27 14:43:54 [core.py:396]             ^^^^^^^^^^^                                                                                                                       
ERROR 04-27 14:43:54 [core.py:396] ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers'                                                                            
ERROR 04-27 14:43:54 [core.py:396]                                                      
ERROR 04-27 14:43:54 [core.py:396] During handling of the above exception, another exception occurred:                                                                           
ERROR 04-27 14:43:54 [core.py:396]                                                      
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):                                                                                                            
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 387, in run_engine_core                                                              
ERROR 04-27 14:43:54 [core.py:396]     engine_core = EngineCoreProc(*args, **kwargs)                                                                                             
ERROR 04-27 14:43:54 [core.py:396]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                             
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 329, in __init__                                                                     
ERROR 04-27 14:43:54 [core.py:396]     super().__init__(vllm_config, executor_class, log_stats,                                                                                  
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 71, in __init__                                                                      
ERROR 04-27 14:43:54 [core.py:396]     self._initialize_kv_caches(vllm_config)                                                                                                   
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 129, in _initialize_kv_caches
ERROR 04-27 14:43:54 [core.py:396]     available_gpu_memory = self.model_executor.determine_available_memory()  
```

This only happens with `Qwen/Qwen2.5-VL-7B-Instruct`. I tried `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`, and it works fine.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: nightly version: ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers' #17263

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: nightly version: ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers' #17263

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions