Skip to content

[Bug]: nightly version: ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers' #17263

Closed
@Zhiyuan-Fan

Description

@Zhiyuan-Fan

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

INFO 04-27 14:39:59 [config.py:3574] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 1
68, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448
, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 2
96, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 1
04, 496, 368, 240, 112, 376]                                                                                                                                                     
INFO 04-27 14:39:59 [weight_utils.py:265] Using model weights format ['*.safetensors']                                                                                           
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]                                                                                                     
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:01<00:01,  1.55s/it]                                                                                             
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.69s/it]                                                                                             
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00,  1.67s/it]                                                                                             
                                                                                                                                                                                 
INFO 04-27 14:40:03 [loader.py:458] Loading weights took 3.43 seconds                                                                                                            
INFO 04-27 14:40:03 [gpu_model_runner.py:1339] Model loading took 7.1557 GiB and 4.703990 seconds                                                                                
INFO 04-27 14:43:48 [gpu_model_runner.py:1612] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.     
ERROR 04-27 14:43:54 [core.py:396] EngineCore failed to start.                                                                                                                   
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):                                                                                                            
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1275, in IMPORT_NAME          
ERROR 04-27 14:43:54 [core.py:396]     value = __import__(                                                                                                                       
ERROR 04-27 14:43:54 [core.py:396]             ^^^^^^^^^^^                                                                                                                       
ERROR 04-27 14:43:54 [core.py:396] ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers'                                                                            
ERROR 04-27 14:43:54 [core.py:396]                                                      
ERROR 04-27 14:43:54 [core.py:396] During handling of the above exception, another exception occurred:                                                                           
ERROR 04-27 14:43:54 [core.py:396]                                                      
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):                                                                                                            
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 387, in run_engine_core                                                              
ERROR 04-27 14:43:54 [core.py:396]     engine_core = EngineCoreProc(*args, **kwargs)                                                                                             
ERROR 04-27 14:43:54 [core.py:396]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                             
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 329, in __init__                                                                     
ERROR 04-27 14:43:54 [core.py:396]     super().__init__(vllm_config, executor_class, log_stats,                                                                                  
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 71, in __init__                                                                      
ERROR 04-27 14:43:54 [core.py:396]     self._initialize_kv_caches(vllm_config)                                                                                                   
ERROR 04-27 14:43:54 [core.py:396]   File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 129, in _initialize_kv_caches
ERROR 04-27 14:43:54 [core.py:396]     available_gpu_memory = self.model_executor.determine_available_memory()  

This only happens with Qwen/Qwen2.5-VL-7B-Instruct. I tried deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, and it works fine.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions