the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm

case1：use tensorrtllm
python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/" \
              --max_output_len 2048 \
              --tokenizer_dir "/tensorrtllm_backend/tokenizer" \
              --input_text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n" \
              --lora_dir "/tensorrtllm_backend/lora_intent" \
              --lora_task_uids 0 \
              --no_add_special_tokens \
              --use_py_session \
              --streaming

Output [Text 0 Beam 0]: "Writing"

case1：use inflight_batcher_llm_client
python3 /tensorrtllm_backend/inflight_batcher_llm/client/inflight_batcher_llm_client.py \
        --request-output-len 2048 \
        --text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n" \
        --tokenizer-dir /tensorrtllm_backend/tokenizer \
        --lora-path "/tensorrtllm_backend/lora_intent" --streaming
Output [Text 0 Beam 0]: "Summary"

The correct answer is "Writing"



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm #413

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the result use inflight_batcher_llm_client to send multiple lora weights is not same as use tensorrtllm #413

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions