Closed
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
INFO 04-27 14:39:59 [config.py:3574] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 1
68, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448
, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 2
96, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 1
04, 496, 368, 240, 112, 376]
INFO 04-27 14:39:59 [weight_utils.py:265] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.55s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.69s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.67s/it]
INFO 04-27 14:40:03 [loader.py:458] Loading weights took 3.43 seconds
INFO 04-27 14:40:03 [gpu_model_runner.py:1339] Model loading took 7.1557 GiB and 4.703990 seconds
INFO 04-27 14:43:48 [gpu_model_runner.py:1612] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.
ERROR 04-27 14:43:54 [core.py:396] EngineCore failed to start.
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1275, in IMPORT_NAME
ERROR 04-27 14:43:54 [core.py:396] value = __import__(
ERROR 04-27 14:43:54 [core.py:396] ^^^^^^^^^^^
ERROR 04-27 14:43:54 [core.py:396] ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers'
ERROR 04-27 14:43:54 [core.py:396]
ERROR 04-27 14:43:54 [core.py:396] During handling of the above exception, another exception occurred:
ERROR 04-27 14:43:54 [core.py:396]
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 387, in run_engine_core
ERROR 04-27 14:43:54 [core.py:396] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-27 14:43:54 [core.py:396] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 329, in __init__
ERROR 04-27 14:43:54 [core.py:396] super().__init__(vllm_config, executor_class, log_stats,
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 71, in __init__
ERROR 04-27 14:43:54 [core.py:396] self._initialize_kv_caches(vllm_config)
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 129, in _initialize_kv_caches
ERROR 04-27 14:43:54 [core.py:396] available_gpu_memory = self.model_executor.determine_available_memory()
This only happens with Qwen/Qwen2.5-VL-7B-Instruct
. I tried deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
, and it works fine.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.