Skip to content

Add documentation for LLM enablement process on ET #8768

Open
@GregoryComer

Description

@GregoryComer

📚 The doc issue

While we hope to provide a standardized and streamlined flow for running LLMs from HF, as well as for individually enabled models (Llama), However, there are going to be use cases where someone wants to enable a model that doesn't fit cleanly into one of these flows. Maybe it has a slightly different architecture and can't drop in our transformer definition. I ran into this recently when working with a Fairseq encoder/decoder language translation model.

I'd like to create documentation that allows for a power user to understand the following:

  1. Why do the optimized ET transformer implementations work? What bits are critical for performance, export compliance, etc.?
  2. If I have a custom transformer implementation that doesn't map exactly to the ET preferred versions, what do I need to do to make it usable with ET?
    a) How do I handle attention and KV cache mutability?
    b) Can I leverage the ET SDPA ops?
    c) How can I use the building blocks / composable components from the extension/llm directory? (Maybe we point to torchtune, as well).
    d) What do I need to do to optimize for specific backends, such as XNNPACK or CoreML?

CC @larryliu0820 @byjlw @mergennachin

Suggest a potential alternative/fix

No response

cc @mergennachin @byjlw @cccclai @helunwencser @jackzhxng

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: docIssues related to documentation, both in docs/ and inlined in codemodule: llmIssues related to LLM examples and apps, and to the extensions/llm/ codemodule: user experienceIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    To triage

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions