Description
(#4872) - This change is a net negative.
I previously was using a Q3KL quant i made of Mixtral instruct which had a file size of 19GB, and the first 10 steps of perplexity are:
[1]3.3321,[2]3.9425,[3]4.5814,[4]4.8466,[5]4.9012,[6]4.9089,[7]5.0452,[8]5.0564,[9]5.2014,[10]5.4589
The new Q3KM is significantly larger at 20.93GB (which means it no longer fits in 24GB with more than 2048CTX, but only had marginally better PPL
[1]3.3211,[2]3.8576,[3]4.5000,[4]4.8174,[5]4.8792,[6]4.8788,[7]5.0093,[8]5.0285,[9]5.1876,[10]5.4449
And for comparison, i did a new Q3KS. File size is 18.8GB, and has significantly worse PPL for only 200MB of less data.
[1]3.3781,[2]3.9713,[3]4.5966,[4]4.8711,[5]4.9429,[6]4.9316,[7]5.0802,[8]5.1067,[9]5.2583,[10]5.5175
Overall I'm finding the updated K quants for Mixtral to be worse in general.