Skip to content

Dynamo Release v0.1.1

Latest
Compare
Choose a tag to compare
@nv-anants nv-anants released this 16 Apr 20:44
· 2 commits to release/0.1.1 since this release
926370b

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.1.1 features:

  • Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
  • TensorRT-LLM support for KV Aware Routing
  • TensorRT-LLM support for Disaggregation
  • ManyLinux and Ubuntu 22.04 Support for wheels and crates
  • Unified logging for Python and Rust

Future plans

  • Instructions for reproducing benchmark guides on GCP and AWS
  • KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
  • Searchable user guides and documentation
  • Multi-node instances for large models
  • Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
  • vLLM 1.0 support with NIXL and KV Cache Events

Known Issues

  • Benchmark guides are still being validated on public cloud instances (GCP / AWS)
  • Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.

What's Changed

New Contributors

Full Changelog: v0.1.0...release/0.1.1