Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.1.1 features:
- Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
- TensorRT-LLM support for KV Aware Routing
- TensorRT-LLM support for Disaggregation
- ManyLinux and Ubuntu 22.04 Support for wheels and crates
- Unified logging for Python and Rust
Future plans
- Instructions for reproducing benchmark guides on GCP and AWS
- KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
- Searchable user guides and documentation
- Multi-node instances for large models
- Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
- vLLM 1.0 support with NIXL and KV Cache Events
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
- Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
What's Changed
- docs: Benchmarking guide updates (#678) by @kthui in #699
- docs: Update support matrix by @pvijayakrish in #691
- fix: change trtllm kv_router default block_size to 32 (#642) by @tanmayv25 in #694
- fix: set correct parent_hash for each kv block when publish kv events by @tanmayv25 in #693
- fix: Remove kv connector from agg config by @ptarasiewiczNV in #655
- fix: Account for Metrics.decode() changes (#619) by @rmccorm4 in #619
- fix: update to match latest nixl notifications as bytes @nnshah1 in #645
- docs: Update support matrix by @pvijayakrish in #633
- docs: Add instructions to install git lfs (#627) by @tanmayv25 in #627
- fix: add DYNAMO_HOME env var to vLLM docker image (#629) by @nv-anants in #629
- feat: TRT-LLM disaggregated serving using UCX (#562) by @tanmayv25 in #562
- docs: Update support matrix by @pvijayakrish in #604
- docs: Guide for multi-node benchmarking (#561) by @kthui in #561
- fix: remove api-store from container by @mohammedabdulwahhab in #617
- docs: Guides for single node benchmarking (#509) by @kthui in #509
- fix: set worker env before worker process spawn by @ishandhanani in #614
- docs: Move trtllm dynamo run doc from example to dynamo run guide (#578) by @tanmayv25 in #578
- chore: update ai-dynamo-vllm wheel version (#598) by @nv-anants in #598
- fix: bump bento to 1.4.8 (#579) by @mohammedabdulwahhab in #579
- fix: update yum install in wheel-builder image (#605) by @nv-anants in #605
- docs: update dynamo serve trtllm agg example yaml files (#600) by @ziqif-nv in #600
- chore: use latest nixl for docker builds by @nv-anants in #596
- chore: update versions to 0.1.1 by @nv-anants in #552
- docs: Updated dynamo run instructions by @cdgamarose-nv in #555
- feat: Add manylinux support for Dynamo by @pvijayakrish in #536
- docs: Clarify the --max-local-prefill-length help description by @kthui in #554
- feat: Add dynamo env CLI option to provide information about user environment by @nv-tusharma in #533
- docs: add disagg tuning guide by @tedzhouhk in #413
- fix: let dynamo run pass --help to dynamo-run by @ziqif-nv in #547
- chore: Update TRTLLM version. Fix router. by @tanmayv25 in #527
- fix: unify and enable dynamo logging by @ishandhanani in #520
- feat(dynamo-run): Basic routing choice by @grahamking in #524
- fix: clean unused bento pieces from serve.py and serving.py by @ishandhanani in #532
- docs: update close-deployment in dynamo_serve.md by @tlipoca9 in #535
- feat: update operator README by @julienmancuso in #544
- fix: mypy error by @ishandhanani in #543
- feat: cleanup operator code by @julienmancuso in #529
- chore: Fixed file headers. Added attributions. by @dmitry-tokarev-nv in #530
- fix: Remove api-server code by @mohammedabdulwahhab in #526
- docs: hello world and vllm process docs by @ishandhanani in #525
- feat: KV recorder for dumping router events into a jsonl by @PeaBrane in #505
- chore: cleaner required workers check (don't spam print) by @PeaBrane in #521
- docs: dynamo-run clarify engine list by @grahamking in #522
- chore: Upgrade Rust to 1.86 by @grahamking in #518
- chore: Add devops in more CODEOWNERS by @grahamking in #512
- feat: Python decorator dynamo_worker takes optional
static
parameter without etcd by @grahamking in #494 - fix: broken link to dynamo run by @lkm2835 in #517
- docs: add 405b disaggregated serving documentation by @ishandhanani in #496
- refactor: migrate engines to standalone crates by @ryanolson in #453
- feat: Add TensorRT-LLM example for dynamo serve/run by @tanmayv25 in #456
- docs: Remove invalid link by @grahamking in #506
- docs: add instruction to copy dynamo-run in container setup by @hanweisen in #508
- chore: Add libclang-dev to CI for llamacpp by @grahamking in #507
- chore: rename duration to timeout by @tlipoca9 in #503
- fix: adding missing file by @ryanolson in #501
- feat: allow replicas to be set in DynamoDeployment CR by @julienmancuso in #486
- chore: Disable blank issue creation for default issues template by @nv-tusharma in #492
- chore: Remove <> from title + add labels for default issues template. by @nv-tusharma in #491
- feat: Sets the code of conduct for the repository by @saturley-hall in #454
- fix: Consolidate dynamo start and dynamo serve commands by @mohammedabdulwahhab in #405
- feat: improve serve commands and expose
DYNAMO_HOME
env var by @jon-chuang in #436 - feat: kv aware router executable by @ryanolson in #399
- feat: deploy and use buildkit to build dynamo images by @julienmancuso in #450
- feat(serve): Enhance multi-node deployment and worker configuration by @ishandhanani in #457
- chore: Add default issue template for bug & feature requests by @nv-tusharma in #471
- feat: unified logging by @ryanolson in #472
- feat: add devcontainer to dynamo for Ubuntu 24.04 use by @hhzhang16 in #466
- docs(README): add local development instructions by @ishandhanani in #463
- fix: sglang worker log extraction error by @KivenChen in #447
- test: Fix trailing white space in test_report.yaml by @pvijayakrish in #455
- test: Display the test report summary under the workflow run for a PR. by @pvijayakrish in #451
- refactor: prometheus upgrade by @ryanolson in #452
- chore: Upgrade llamacpp dependency by @grahamking in #449
- fix: potential out-of-bound by @ezhoureal in #420
- fix: prefill queue handler async task should not silently error by @jon-chuang in #442
- feat: dynamo deploy hello world example to k8s by @biswapanda in #205
- chore: Add user to dynamo deploy codeowners by @mohammedabdulwahhab in #411
- fix: disabling sse keep-alive by @ryanolson in #408
- fix: Revert "chore: Bump bentoml version to 1.4.6" by @mohammedabdulwahhab in #409
- feat: Decode -> Prefill cached kv transfer by @ptarasiewiczNV in #340
- fix: limit rust build parallel jobs by @tedzhouhk in #366
- chore: Bump bentoml version to 1.4.6 by @dmitry-tokarev-nv in #404
- chore: more Pythonic kv router cleanups in examples by @PeaBrane in #396
- fix: add codeowners for examples by @sshchoi in #325
- feat: Allow passing any arguments to vllm and sglang engines by @grahamking in #368
- docs: Fix capitalization by @EaminC in #367
- feat: Build pre-processor from GGUF by @grahamking in #344
- feat: conditional disagg based on prefill queue size by @tedzhouhk in #303
- fix: Attach lease to etcd key by @grahamking in #364
- docs: fix grammar in architecture support note by @EaminC in #346
- chore: Clarified docs, added more informative error prints by @oandreeva-nv in #342
- docs: Update support_matrix.md - list glibc min version by @dmitry-tokarev-nv in #341
- chore: add warn log when fix_venv failed by @zhaohaidao in #338
- docs: fix typo in dynamo_serve.md by @eltociear in #314
- docs: Update main and guide readmes by @harryskim in #332
- fix: use rustup-init for rust install by @nv-anants in #319
- chore: KV router Pythonic cleanups by @PeaBrane in #324
- ci: Add External Contribution label by @nvda-mesharma in #322
- chore: Make debug profile use all optimizations by @grahamking in #317
- feat: add more useful APIs for tokens by @nora-coder-dot in #313
- fix: helm tmpl by @gujingit in #307
- feat:
Frontend
component uses served_model_name instead of model by @ishandhanani in #302 - chore: remove older unused components by @ishandhanani in #300
- chore: Update dynamo.code-workspace by @1ntEgr8 in #282
- fix: update crates metadata by @nv-anants in #264
- fix: Add init.py for compoments folder in llm example by @piotrm-nvidia in #299
- chore: Don't depend on openssl by @grahamking in #292
- feat: enable LTO and codegen-units = 1 optimizations by @zamazan4ik in #279
- fix(mistralrs): Disable paged attention by @grahamking in #234
- docs: Move back dynamo deploy file to the guides subfolder in docs by @mohammedabdulwahhab in #295
- fix(dynamo-run): Fix build if llamacpp and mistralrs are disabled by @grahamking in #262
- docs: proper installation steps + Ubuntu 24.04 support by @dmitry-tokarev-nv in #275
- docs: Update README.md - add missing python3-pip package by @dmitry-tokarev-nv in #263
- fix: update readme discord link by @ishandhanani in #271
- docs: dynamo serve guide by @ishandhanani in #270
- docs: Clean up of readme for deploying to K8s using helm by @mohammedabdulwahhab in #266
- docs(dynamo-run): Move README into docs/guides/ , add Quickstart by @grahamking in #265
- feat: add local gpu allocation by @biswapanda in #232
- docs: fix links in docs by @dmitry-tokarev-nv in #256
- chore: remove dynamo from vllm whl version by @nv-anants in #257
- fix: temporary documentation for crates.io by @saturley-hall in #255
New Contributors
- @mohammedabdulwahhab made their first contribution in #617
- @kthui made their first contribution in #509
- @cdgamarose-nv made their first contribution in #555
- @nv-tusharma made their first contribution in #533
- @tlipoca9 made their first contribution in #535
- @PeaBrane made their first contribution in #505
- @lkm2835 made their first contribution in #517
- @hanweisen made their first contribution in #508
- @jon-chuang made their first contribution in #436
- @KivenChen made their first contribution in #447
- @ezhoureal made their first contribution in #420
- @sshchoi made their first contribution in #325
- @EaminC made their first contribution in #367
- @oandreeva-nv made their first contribution in #342
- @zhaohaidao made their first contribution in #338
- @eltociear made their first contribution in #314
- @harryskim made their first contribution in #332
- @nora-coder-dot made their first contribution in #313
- @gujingit made their first contribution in #307
- @1ntEgr8 made their first contribution in #282
- @zamazan4ik made their first contribution in #279
Full Changelog: v0.1.0...release/0.1.1