[Issue Template]Short one-line summary of the issue

**Environment**

If applicable, please include the following:

- CPU architecture (e.g., x86_64, aarch64)
- CPU/Host memory size (if known)
- GPU properties
  - GPU name (e.g., NVIDIA H100, NVIDIA A100, NVIDIA L40S)
  - GPU memory size (if known)
  - Clock frequencies used (if applicable)
- Libraries
  - TensorRT-LLM backend branch or tag (e.g., main, v0.7.1)
  - TensorRT-LLM backend commit (if known)
  - Versions of TensorRT, AMMO, CUDA, cuBLAS, etc. used
  - Container used (if running TensorRT-LLM backend in a container)
- NVIDIA driver version
- OS (Ubuntu 22.04, CentOS 7, Windows 10)
- Any other information that may be useful in reproducing the bug

**Reproduction Steps**

Provide detailed reproduction steps for the issue here, including any commands run on the command line.

**Expected Behavior**

Provide a brief summary of the expected behavior of the software. Provide output files or examples if possible.

**Actual Behavior**

Describe the actual behavior of the software and how it deviates from the expected behavior. Provide output files or examples if possible.

**Additional Notes**

Provide any additional context here you think might be useful for the TensorRT-LLM team to help debug this issue (such as experiments done, potential things to investigate).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue Template]Short one-line summary of the issue #270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue Template]Short one-line summary of the issue #270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions