Enable torchao.experimental EmbeddingQuantization

### 🚀 The feature, motivation and pitch

Quantization is a technique used to reduce the speed, size, or memory requirements of a model and [torchao](https://github.com/pytorch/ao) is PyTorch's native quantization library for inference and training

There are new experimental quantizations in torchao that we would like to enable in torchchat. Specifically this task is for enabling [EmbeddingQuantizer](https://github.com/pytorch/ao/blob/42e1345f0bc451383bcd27e39e93d4ae673eabe0/torchao/experimental/quant_api.py#L585) and [SharedEmbeddingQuantizer](https://github.com/pytorch/ao/blob/42e1345f0bc451383bcd27e39e93d4ae673eabe0/torchao/experimental/quant_api.py#L882).

**Entrypoint**: https://github.com/pytorch/torchchat/blob/1384f7d3d7af0847d8364fe7b300a8b49f2213c2/torchchat/utils/quantize.py#L101

**Task**: Using ExecuTorch as a reference (https://github.com/pytorch/executorch/pull/9548) add support for EmbeddingQuantizer and SharedEmbeddingQuantizer.

cc: @metascroy, @manuelcandales 

### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable torchao.experimental EmbeddingQuantization #1520

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable torchao.experimental EmbeddingQuantization #1520

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions