Skip to content

Commit f3404ec

Browse files
committed
enforce user to input max_num_batched_tokens, max_num_seqs, max_model_len to reduce chance of perf degradation
Signed-off-by: Chenyaaang <[email protected]>
1 parent 05e1fbf commit f3404ec

File tree

2 files changed

+14
-1
lines changed

2 files changed

+14
-1
lines changed

examples/online_serving/chart-helm/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ image:
88
# -- Image tag
99
tag: "latest"
1010
# -- Container launch command
11-
command: ["vllm", "serve", "/data/", "--served-model-name", "opt-125m", "--dtype", "bfloat16", "--host", "0.0.0.0", "--port", "8000"]
11+
command: ["vllm", "serve", "/data/", "--served-model-name", "opt-125m", "--dtype", "bfloat16", "--host", "0.0.0.0", "--port", "8000", "--max-num-batched-tokens", "2048", "--max-num-seqs", "16", "--max-model-len", "2048"]
1212

1313
# -- Container port
1414
containerPort: 8000

vllm/entrypoints/openai/cli_args.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,19 @@ def validate_parsed_serve_args(args: argparse.Namespace):
289289
raise TypeError("Error: --enable-reasoning requires "
290290
"--reasoning-parser")
291291

292+
# Ensure that --max-num-batched-tokens, --max-num-seqs, --max-model-len
293+
# are passed within command on TPU.
294+
from vllm.platforms import current_platform
295+
if current_platform.is_tpu():
296+
if args.max_num_batched_tokens is None:
297+
raise ValueError("Requires --max-num-batched-tokens")
298+
299+
if args.max_num_seqs is None:
300+
raise ValueError("Requires --max-num-seqs")
301+
302+
if args.max_model_len is None:
303+
raise ValueError("Requires --max-model-len")
304+
292305

293306
def create_parser_for_docs() -> FlexibleArgumentParser:
294307
parser_for_docs = FlexibleArgumentParser(

0 commit comments

Comments
 (0)