Add documentation for LLM enablement process on ET

### 📚 The doc issue

While we hope to provide a standardized and streamlined flow for running LLMs from HF, as well as for individually enabled models (Llama), However, there are going to be use cases where someone wants to enable a model that doesn't fit cleanly into one of these flows. Maybe it has a slightly different architecture and can't drop in our transformer definition. I ran into this recently when working with a Fairseq encoder/decoder language translation model.

I'd like to create documentation that allows for a power user to understand the following:
1) Why do the optimized ET transformer implementations work? What bits are critical for performance, export compliance, etc.?
2) If I have a custom transformer implementation that doesn't map exactly to the ET preferred versions, what do I need to do to make it usable with ET?
 a) How do I handle attention and KV cache mutability?
 b) Can I leverage the ET SDPA ops?
 c) How can I use the building blocks / composable components from the extension/llm directory? (Maybe we point to torchtune, as well).
 d) What do I need to do to optimize for specific backends, such as XNNPACK or CoreML?

CC @larryliu0820 @byjlw @mergennachin 

### Suggest a potential alternative/fix

_No response_

cc @mergennachin @byjlw @cccclai @helunwencser @jackzhxng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add documentation for LLM enablement process on ET #8768

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add documentation for LLM enablement process on ET #8768

Description

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions