Open
Description
📚 The doc issue
The current quantization overview page is a bit sparse: https://pytorch.org/executorch/main/quantization-overview.html. I'd like to update it as follows:
- Move under Usage/ since it's the only page under Quantization/ currently.
- Split out information intended for backend authors (info about writing a quantizer, for example). Focus on user-facing APIs.
- Document backend-invariant quantization flows (embeddings, ao kernels, etc.). Include info (and example) on composable quantizer.
- Document PT2E and quantize_ flows.
- Cover the general, high level approach to quantizing different types of models.
- CV models
- Transformers / language models
- Talk briefly about options for evaluating quantized model accuracy (running in eager mode vs pybindings vs on-device, for example)
Suggest a potential alternative/fix
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
To triage