Open
Description
System Info
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Observe docker images sizes on https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags for trtllm:
- nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 is 18.46 GB

- nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3 is 8.38 GB

- the issue still persists in nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3 which is 18.48 GB
Expected behavior
Docker image size remains around 8 GB as in previous releases
actual behavior
Docker image size increased to more than 18 GB in 24.04 and is still high
additional notes
docker image size is important when autoscaling is used, as pulling larger docker images takes more time