Skip to content

Deploying VITA-1.5 Multimodal Model with ExecuTorch #10757

Open
@jordanqi

Description

@jordanqi

🚀 The feature, motivation and pitch

I’m trying to deploy a VITA-1.5 multimodal model (supports audio, vision, and text) using ExecuTorch.

The tokenizer is in Hugging Face tokenizer.json format, and I’d like to ask:

  1. Is there any suggested way to convert the model into .pte format for ExecuTorch?
  2. Since this is a new architecture, is there any guidance or examples for adding custom models?
  3. Can I still use the LlamaDemo Android app with this multimodal?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: llmIssues related to LLM examples and apps, and to the extensions/llm/ codetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions