Skip to content

Enable torchao.experimental EmbeddingQuantization #1520

Open
@Jack-Khuu

Description

@Jack-Khuu

🚀 The feature, motivation and pitch

Quantization is a technique used to reduce the speed, size, or memory requirements of a model and torchao is PyTorch's native quantization library for inference and training

There are new experimental quantizations in torchao that we would like to enable in torchchat. Specifically this task is for enabling EmbeddingQuantizer and SharedEmbeddingQuantizer.

Entrypoint:

def quantize_model(

Task: Using ExecuTorch as a reference (pytorch/executorch#9548) add support for EmbeddingQuantizer and SharedEmbeddingQuantizer.

cc: @metascroy, @manuelcandales

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Assignees

Labels

QuantizationIssues related to Quantization or torchaotriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions