Description
case1:use tensorrtllm
python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/"
--max_output_len 2048
--tokenizer_dir "/tensorrtllm_backend/tokenizer"
--input_text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n"
--lora_dir "/tensorrtllm_backend/lora_intent"
--lora_task_uids 0
--no_add_special_tokens
--use_py_session
--streaming
Output [Text 0 Beam 0]: "Writing"
case1:use inflight_batcher_llm_client
python3 /tensorrtllm_backend/inflight_batcher_llm/client/inflight_batcher_llm_client.py
--request-output-len 2048
--text "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nWhat is the intention of the following user questions? \Can you help me write a summary<|im_end|>\n<|im_start|>assistant\n"
--tokenizer-dir /tensorrtllm_backend/tokenizer
--lora-path "/tensorrtllm_backend/lora_intent" --streaming
Output [Text 0 Beam 0]: "Summary"
The correct answer is "Writing"