This guide explains how to use the dynamo build
command to containerize Dynamo inference graphs (pipelines) for deployment.
- What is dynamo build?
- Building a containerized inference graph
- Guided Example for containerizing Hello World pipeline
- Guided Example for containerizing LLM pipeline
dynamo build
is a command-line tool that helps containerize inference graphs created with Dynamo SDK. Simply run dynamo build --containerize
to build a stand-alone Docker container that encapsulates your entire inference graph. This image can then be shared and run standalone.
Note: This is currently an experimental feature and has only been tested on the examples available in the examples/
directory. You may have to make some modifications, in particular if your inference graph introduces custom dependencies.
The basic workflow for using dynamo build
involves:
- Defining your inference graph and testing locally with
dynamo serve
- Specifying a base image for your inference graph. More on this below.
- Running
dynamo build
to build a containerized inference graph
dynamo build <graph_definition> --containerize
This section will walk through a complete example of building a containerized inference graph. In this example, we simply containerize the Hello World pipeline available at examples/hello_world
cd examples/hello_world
dynamo serve hello_world:Frontend
We intend to support 2 base images which will be available as buildable targets in the top-level Earthfile. You may then use one of these images as the base image to build your inference graph.
- Leaner image without CUDA and vLLM making it suitable for CPU only deployments. This is what we will use for the Hello World example. Available as
dynamo-base-docker
in the top-level Earthfile. - Base image with CUDA and vLLM making it suitable for GPU deployments. Note: While this is not yet available in the top-level Earthfile, you may use
dynamo:latest-vllm
image created from running./container/build.sh
as a valid base image for this purpose.
export CI_REGISTRY_IMAGE=my-registry
export CI_COMMIT_SHA=hello-world
earthly +dynamo-base-docker --CI_REGISTRY_IMAGE=$CI_REGISTRY_IMAGE --CI_COMMIT_SHA=$CI_COMMIT_SHA
# Image should succesfully be built and tagged as my-registry/dynamo-base-docker:hello-world
export DYNAMO_IMAGE=my-registry/dynamo-base-docker:hello-world
dynamo build hello_world:Frontend --containerize
# Output will contain tag for the newly created image
# e.g frontend-hello-world:latest
As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it here.
docker compose up -d
Starting your container with host networking and required environment variables:
# Host networking is required for NATS and etcd to be accessible from the container
docker run --network host \
--entrypoint bash \
--ipc host \
frontend:<generated_tag> \
-c "cd src && uv run dynamo serve hello_world:Frontend"
Test your containerized Dynamo services:
curl -X 'POST' \
'http://localhost:8000/generate' \
-H 'accept: text/event-stream' \
-H 'Content-Type: application/json' \
-d '{
"text": "test"
}'
This section will walk through an example of building a containerized LLM inference graph using the example available at examples/llm
.
cd examples/llm
dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml
For LLM inference, we'll use the GPU-enabled base image with CUDA and vLLM support. You can use the dynamo:latest-vllm
image created from running ./container/build.sh
as the base image.
# Build the base image with CUDA and vLLM support
./container/build.sh
# This will create dynamo:latest-vllm image
export DYNAMO_IMAGE=dynamo:latest-vllm
dynamo build graphs.agg:Frontend --containerize
# Output will contain tag for the newly created image
# e.g frontend-llm-agg:latest
As a prerequisite, ensure you have NATS and etcd running by running the docker compose in the deploy directory. You can find it here.
docker compose up -d
Starting your container with host networking and required environment variables:
# Host networking is required for NATS and etcd to be accessible from the container
docker run --network host \
--entrypoint sh \
--gpus all \
--shm-size 10G \
--ipc host \
frontend:<generated_tag> \
-c "cd src && uv run dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml"
Once the container is running, you can test it by making a request to the service:
curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"stream": false,
"max_tokens": 30
}'