Closed
Description
This pr mentioned a while back that, since Llama 70b used GQA, there is a specific k-quantization trick that allows them to quantize with marginal model size increases:

Mistral 7b, a very popular model released after this PR was made, also uses Grouped Query Attention.
Checking for this if the 7b is a Mistral model and applying the same treatment should theoretically provide similar gains unless I am mistaken.

I think in general quantization optimization is sorely overlooked, lots of low hanging fruit there for sure....