Llama Stack SDK 0.2.2 Update
Update SDK to support Llama Stack v0.2.2 which includes multi-image inference.
Local RAG Support
The major update is to enable local RAG. The local RAG implementation is 100% offline and is completely on-device.
The local module SDK supports the end-to-end solution of:
- creating a vector DB instance
- creating text chunks
- Receiving embeddings from the Android app
- Storing embeddings in a vector DB
- Managing the agent turn with RAG tool call to receive a revalant response from the LLM.
On-device Vector DB solution: ObjectBox
Android Demo App
RAG
We've added a RAG feature in the demo app to help showcase how to use remote RAG and local RAG SDKs. With creating a document object, registering a vector db, and using RagTool from Llama Stack, the remote RAG feature contains all RAG-specific logic.
- Improved User Experience: The remote RAG feature provides a seamless experience for users, allowing them to ask questions and receive accurate answers quickly.
- Increased Efficiency: With the ability to process large documents and retrieve relevant information, the remote RAG feature saves time and increases productivity.
In this example, a PDF or text file (i.e. Car Manual) can easily be processed for Question-Answer inference scenarios with the user.
Also with a few code line changes, you can switch to using local RAG. That's the advantage of Llama Stack Mobile SDKs - to be able to interoperate between remote and local without major code changes!
Multi-image Inference
We've built in a sample support for being able to select multiple images and run inference with Llama 4.
Contributors
@ashwinb, @cmodi-meta, @dltn , @Riandy, @seyeong-han, @WuhanMonkey, @yanxi0830