Release Dynamo Release v0.1.1 · ai-dynamo/dynamo

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.1.1 features:

Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
TensorRT-LLM support for KV Aware Routing
TensorRT-LLM support for Disaggregation
ManyLinux and Ubuntu 22.04 Support for wheels and crates
Unified logging for Python and Rust

Future plans

Instructions for reproducing benchmark guides on GCP and AWS
KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
Searchable user guides and documentation
Multi-node instances for large models
Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
vLLM 1.0 support with NIXL and KV Cache Events

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)
Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.

What's Changed

docs: Benchmarking guide updates (#678) by @kthui in #699
docs: Update support matrix by @pvijayakrish in #691
fix: change trtllm kv_router default block_size to 32 (#642) by @tanmayv25 in #694
fix: set correct parent_hash for each kv block when publish kv events by @tanmayv25 in #693
fix: Remove kv connector from agg config by @ptarasiewiczNV in #655
fix: Account for Metrics.decode() changes (#619) by @rmccorm4 in #619
fix: update to match latest nixl notifications as bytes @nnshah1 in #645
docs: Update support matrix by @pvijayakrish in #633
docs: Add instructions to install git lfs (#627) by @tanmayv25 in #627
fix: add DYNAMO_HOME env var to vLLM docker image (#629) by @nv-anants in #629
feat: TRT-LLM disaggregated serving using UCX (#562) by @tanmayv25 in #562
docs: Update support matrix by @pvijayakrish in #604
docs: Guide for multi-node benchmarking (#561) by @kthui in #561
fix: remove api-store from container by @mohammedabdulwahhab in #617
docs: Guides for single node benchmarking (#509) by @kthui in #509
fix: set worker env before worker process spawn by @ishandhanani in #614
docs: Move trtllm dynamo run doc from example to dynamo run guide (#578) by @tanmayv25 in #578
chore: update ai-dynamo-vllm wheel version (#598) by @nv-anants in #598
fix: bump bento to 1.4.8 (#579) by @mohammedabdulwahhab in #579
fix: update yum install in wheel-builder image (#605) by @nv-anants in #605
docs: update dynamo serve trtllm agg example yaml files (#600) by @ziqif-nv in #600
chore: use latest nixl for docker builds by @nv-anants in #596
chore: update versions to 0.1.1 by @nv-anants in #552
docs: Updated dynamo run instructions by @cdgamarose-nv in #555
feat: Add manylinux support for Dynamo by @pvijayakrish in #536
docs: Clarify the --max-local-prefill-length help description by @kthui in #554
feat: Add dynamo env CLI option to provide information about user environment by @nv-tusharma in #533
docs: add disagg tuning guide by @tedzhouhk in #413
fix: let dynamo run pass --help to dynamo-run by @ziqif-nv in #547
chore: Update TRTLLM version. Fix router. by @tanmayv25 in #527
fix: unify and enable dynamo logging by @ishandhanani in #520
feat(dynamo-run): Basic routing choice by @grahamking in #524
fix: clean unused bento pieces from serve.py and serving.py by @ishandhanani in #532
docs: update close-deployment in dynamo_serve.md by @tlipoca9 in #535
feat: update operator README by @julienmancuso in #544
fix: mypy error by @ishandhanani in #543
feat: cleanup operator code by @julienmancuso in #529
chore: Fixed file headers. Added attributions. by @dmitry-tokarev-nv in #530
fix: Remove api-server code by @mohammedabdulwahhab in #526
docs: hello world and vllm process docs by @ishandhanani in #525
feat: KV recorder for dumping router events into a jsonl by @PeaBrane in #505
chore: cleaner required workers check (don't spam print) by @PeaBrane in #521
docs: dynamo-run clarify engine list by @grahamking in #522
chore: Upgrade Rust to 1.86 by @grahamking in #518
chore: Add devops in more CODEOWNERS by @grahamking in #512
feat: Python decorator dynamo_worker takes optional static parameter without etcd by @grahamking in #494
fix: broken link to dynamo run by @lkm2835 in #517
docs: add 405b disaggregated serving documentation by @ishandhanani in #496
refactor: migrate engines to standalone crates by @ryanolson in #453
feat: Add TensorRT-LLM example for dynamo serve/run by @tanmayv25 in #456
docs: Remove invalid link by @grahamking in #506
docs: add instruction to copy dynamo-run in container setup by @hanweisen in #508
chore: Add libclang-dev to CI for llamacpp by @grahamking in #507
chore: rename duration to timeout by @tlipoca9 in #503
fix: adding missing file by @ryanolson in #501
feat: allow replicas to be set in DynamoDeployment CR by @julienmancuso in #486
chore: Disable blank issue creation for default issues template by @nv-tusharma in #492
chore: Remove <> from title + add labels for default issues template. by @nv-tusharma in #491
feat: Sets the code of conduct for the repository by @saturley-hall in #454
fix: Consolidate dynamo start and dynamo serve commands by @mohammedabdulwahhab in #405
feat: improve serve commands and expose DYNAMO_HOME env var by @jon-chuang in #436
feat: kv aware router executable by @ryanolson in #399
feat: deploy and use buildkit to build dynamo images by @julienmancuso in #450
feat(serve): Enhance multi-node deployment and worker configuration by @ishandhanani in #457
chore: Add default issue template for bug & feature requests by @nv-tusharma in #471
feat: unified logging by @ryanolson in #472
feat: add devcontainer to dynamo for Ubuntu 24.04 use by @hhzhang16 in #466
docs(README): add local development instructions by @ishandhanani in #463
fix: sglang worker log extraction error by @KivenChen in #447
test: Fix trailing white space in test_report.yaml by @pvijayakrish in #455
test: Display the test report summary under the workflow run for a PR. by @pvijayakrish in #451
refactor: prometheus upgrade by @ryanolson in #452
chore: Upgrade llamacpp dependency by @grahamking in #449
fix: potential out-of-bound by @ezhoureal in #420
fix: prefill queue handler async task should not silently error by @jon-chuang in #442
feat: dynamo deploy hello world example to k8s by @biswapanda in #205
chore: Add user to dynamo deploy codeowners by @mohammedabdulwahhab in #411
fix: disabling sse keep-alive by @ryanolson in #408
fix: Revert "chore: Bump bentoml version to 1.4.6" by @mohammedabdulwahhab in #409
feat: Decode -> Prefill cached kv transfer by @ptarasiewiczNV in #340
fix: limit rust build parallel jobs by @tedzhouhk in #366
chore: Bump bentoml version to 1.4.6 by @dmitry-tokarev-nv in #404
chore: more Pythonic kv router cleanups in examples by @PeaBrane in #396
fix: add codeowners for examples by @sshchoi in #325
feat: Allow passing any arguments to vllm and sglang engines by @grahamking in #368
docs: Fix capitalization by @EaminC in #367
feat: Build pre-processor from GGUF by @grahamking in #344
feat: conditional disagg based on prefill queue size by @tedzhouhk in #303
fix: Attach lease to etcd key by @grahamking in #364
docs: fix grammar in architecture support note by @EaminC in #346
chore: Clarified docs, added more informative error prints by @oandreeva-nv in #342
docs: Update support_matrix.md - list glibc min version by @dmitry-tokarev-nv in #341
chore: add warn log when fix_venv failed by @zhaohaidao in #338
docs: fix typo in dynamo_serve.md by @eltociear in #314
docs: Update main and guide readmes by @harryskim in #332
fix: use rustup-init for rust install by @nv-anants in #319
chore: KV router Pythonic cleanups by @PeaBrane in #324
ci: Add External Contribution label by @nvda-mesharma in #322
chore: Make debug profile use all optimizations by @grahamking in #317
feat: add more useful APIs for tokens by @nora-coder-dot in #313
fix: helm tmpl by @gujingit in #307
feat: Frontend component uses served_model_name instead of model by @ishandhanani in #302
chore: remove older unused components by @ishandhanani in #300
chore: Update dynamo.code-workspace by @1ntEgr8 in #282
fix: update crates metadata by @nv-anants in #264
fix: Add init.py for compoments folder in llm example by @piotrm-nvidia in #299
chore: Don't depend on openssl by @grahamking in #292
feat: enable LTO and codegen-units = 1 optimizations by @zamazan4ik in #279
fix(mistralrs): Disable paged attention by @grahamking in #234
docs: Move back dynamo deploy file to the guides subfolder in docs by @mohammedabdulwahhab in #295
fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled by @grahamking in #262
docs: proper installation steps + Ubuntu 24.04 support by @dmitry-tokarev-nv in #275
docs: Update README.md - add missing python3-pip package by @dmitry-tokarev-nv in #263
fix: update readme discord link by @ishandhanani in #271
docs: dynamo serve guide by @ishandhanani in #270
docs: Clean up of readme for deploying to K8s using helm by @mohammedabdulwahhab in #266
docs(dynamo-run): Move README into docs/guides/ , add Quickstart by @grahamking in #265
feat: add local gpu allocation by @biswapanda in #232
docs: fix links in docs by @dmitry-tokarev-nv in #256
chore: remove dynamo from vllm whl version by @nv-anants in #257
fix: temporary documentation for crates.io by @saturley-hall in #255

New Contributors

@mohammedabdulwahhab made their first contribution in #617
@kthui made their first contribution in #509
@cdgamarose-nv made their first contribution in #555
@nv-tusharma made their first contribution in #533
@tlipoca9 made their first contribution in #535
@PeaBrane made their first contribution in #505
@lkm2835 made their first contribution in #517
@hanweisen made their first contribution in #508
@jon-chuang made their first contribution in #436
@KivenChen made their first contribution in #447
@ezhoureal made their first contribution in #420
@sshchoi made their first contribution in #325
@EaminC made their first contribution in #367
@oandreeva-nv made their first contribution in #342
@zhaohaidao made their first contribution in #338
@eltociear made their first contribution in #314
@harryskim made their first contribution in #332
@nora-coder-dot made their first contribution in #313
@gujingit made their first contribution in #307
@1ntEgr8 made their first contribution in #282
@zamazan4ik made their first contribution in #279

Full Changelog: v0.1.0...release/0.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamo Release v0.1.1

What's Changed

New Contributors

Contributors