Open
Description
Environment
If applicable, please include the following:
- CPU architecture (e.g., x86_64, aarch64)
- CPU/Host memory size (if known)
- GPU properties
- GPU name (e.g., NVIDIA H100, NVIDIA A100, NVIDIA L40S)
- GPU memory size (if known)
- Clock frequencies used (if applicable)
- Libraries
- TensorRT-LLM backend branch or tag (e.g., main, v0.7.1)
- TensorRT-LLM backend commit (if known)
- Versions of TensorRT, AMMO, CUDA, cuBLAS, etc. used
- Container used (if running TensorRT-LLM backend in a container)
- NVIDIA driver version
- OS (Ubuntu 22.04, CentOS 7, Windows 10)
- Any other information that may be useful in reproducing the bug
Reproduction Steps
Provide detailed reproduction steps for the issue here, including any commands run on the command line.
Expected Behavior
Provide a brief summary of the expected behavior of the software. Provide output files or examples if possible.
Actual Behavior
Describe the actual behavior of the software and how it deviates from the expected behavior. Provide output files or examples if possible.
Additional Notes
Provide any additional context here you think might be useful for the TensorRT-LLM team to help debug this issue (such as experiments done, potential things to investigate).
Metadata
Metadata
Assignees
Labels
No labels