New Mixtral K quants are worse compared to old. 

(https://github.com/ggerganov/llama.cpp/pull/4872) - This change is a net negative. 

I previously was using a Q3KL quant i made of Mixtral instruct which had a file size of 19GB, and the first 10 steps of perplexity are: 
``` [1]3.3321,[2]3.9425,[3]4.5814,[4]4.8466,[5]4.9012,[6]4.9089,[7]5.0452,[8]5.0564,[9]5.2014,[10]5.4589 ```

The new  Q3KM is significantly larger at 20.93GB (which means it no longer fits in 24GB with more than 2048CTX, but only had marginally better PPL
``` [1]3.3211,[2]3.8576,[3]4.5000,[4]4.8174,[5]4.8792,[6]4.8788,[7]5.0093,[8]5.0285,[9]5.1876,[10]5.4449 ```

And for comparison, i did a new Q3KS. File size is 18.8GB, and has significantly worse PPL for only 200MB of less data. 
``` [1]3.3781,[2]3.9713,[3]4.5966,[4]4.8711,[5]4.9429,[6]4.9316,[7]5.0802,[8]5.1067,[9]5.2583,[10]5.5175 ```

Overall I'm finding the updated K quants for Mixtral to be worse in general. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New Mixtral K quants are worse compared to old. #4900

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New Mixtral K quants are worse compared to old. #4900

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions