Skip to content

the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm #413

Open
@stifles

Description

@stifles

case1:use tensorrtllm
python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/"
--max_output_len 2048
--tokenizer_dir "/tensorrtllm_backend/tokenizer"
--input_text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n"
--lora_dir "/tensorrtllm_backend/lora_intent"
--lora_task_uids 0
--no_add_special_tokens
--use_py_session
--streaming

Output [Text 0 Beam 0]: "Writing"

case1:use inflight_batcher_llm_client
python3 /tensorrtllm_backend/inflight_batcher_llm/client/inflight_batcher_llm_client.py
--request-output-len 2048
--text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n"
--tokenizer-dir /tensorrtllm_backend/tokenizer
--lora-path "/tensorrtllm_backend/lora_intent" --streaming
Output [Text 0 Beam 0]: "Summary"

The correct answer is "Writing"

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions