Deploying VITA-1.5 Multimodal Model with ExecuTorch

### 🚀 The feature, motivation and pitch

I’m trying to deploy a VITA-1.5 multimodal model (supports audio, vision, and text) using ExecuTorch.

The tokenizer is in Hugging Face tokenizer.json format, and I’d like to ask:
1. Is there any suggested way to convert the model into .pte format for ExecuTorch?
2. Since this is a new architecture, is there any guidance or examples for adding custom models?
3. Can I still use the LlamaDemo Android app with this multimodal？

### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deploying VITA-1.5 Multimodal Model with ExecuTorch #10757

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deploying VITA-1.5 Multimodal Model with ExecuTorch #10757

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions