Skip to content

Quantized model repository for different backends #11034

Open
@shreshth-tru

Description

@shreshth-tru

🚀 The feature, motivation and pitch

Hi,
I've been wondering if there's a public repository of .pte files for specific backends. For example, while I can generate .pte files for the Llama 3.2 1B model targeting the QNN backend, that model doesn’t work at all—it produces gibberish outputs consistently, as others have also reported.

Unfortunately, I don’t have the compute resources to generate .pte files for larger models like the 8B variant, which might actually work. This leaves the entire exercise stuck.

Is there a community repository or source where precompiled .pte files for larger models and backends are shared?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: llmIssues related to LLM examples and apps, and to the extensions/llm/ codetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions