Skip to content

QX_4 quantization #1240

Closed
Closed
@ikawrakow

Description

@ikawrakow

Summary

Use 16 x 8 "super-blocks" for quantization, having one fp16 scale for the "super-block" and 16 quantized scales per 8 model weights. This is particularly useful for 2- and 3-bit quantization, but it also outperforms the existing 4-bit quantization schemes Q4_0 and Q4_2.

Details

The naming of existing llama.cpp quantizations follows the scheme QX_Y, where X is the number of bits used for the quants, and Y is 0, 1, 2, or 3. When Y is even (0 or 2), model weights x are computed from the quants q as x = d * q. When Y is odd, then x = m + d * q is used. If we look at the integer part of Y/2 ([Y/2]), then the number of weights in a quantization block is 32 (Q4_0, Q4_1, Q5_0) when [Y/2] = 0, and 16 (Q4_2, Q4_3) when [Y/2] = 1. From the latest perplexity results one can see that quantization using blocks of 16 weights performs better than quantization that uses blocks of 32. The logical conclusion from this would be to look into using blocks of 8 weights. Following the existing naming convention, quantization of type x = d * q for blocks with 8 weights would be QX_4, and quantization of type x = m + d * q would be QX_5. The problem with going to blocks with 8 weights using the same strategy as utilized in Q4_2 and Q4_3 is that the bits needed to store the scale d (or scale d and offset m) start becoming comparable to the number of bits used for the quants q. For instance, using fp16 for the scale in a block of 8 weights requires 16 bits, while the quants need 32 bits for 4-bit quantization, so effectively 6 bits per weight (bpw).

So, after this long introduction, here is an idea how one can use quantization blocks of 8 weights while keeping bpw reasonable: one can use "super-blocks" that combine N quantization blocks. The scale in each block of 8 weights is stored as int8_t, and there is a single fp16 that converts the quantized scales to their final value. E.g., for 4-bit quantization

#define QK4_4 128       
typedef struct {      
    int8_t  scales[QK4_4/8];   // quantized scales per 8 weights   
    uint8_t qs[QK4_4/2];        // nibbles / quants of the "super-block"       
    ggml_fp16_t d;                  //  "super-block" scale  
} block_q4_4;       

In the above, N = 16, i.e., there are 16 blocks of 8 weights, each having its own 8-bit quantized scale. This ends up using 5.125 bpw (4 + 1.125).

To further clarify the idea, here is a simple scalar implementation of the de-quantization for Q4_4:

static void dequantize_row_q4_4(const void * restrict vx, float * restrict y, int k) {
    assert(k % QK4_4 == 0);
    const int nb = k / QK4_4;
          
    const block_q4_4 * restrict x = vx;
    
    uint32_t u;
    for (int i = 0; i < nb; i++) {
        const float d_all = GGML_FP16_TO_FP32(x[i].d);

        const uint8_t * q = x[i].qs;
    
        for (int n = 0; n < QK4_4/8; ++n) {
            memcpy(&u, q, 4);
            const uint32_t u1 = (u >> 0) & 0x0f0f0f0f;
            const uint32_t u2 = (u >> 4) & 0x0f0f0f0f;
            const int8_t * v1 = (const int8_t*)&u1;
            const int8_t * v2 = (const int8_t*)&u2;
            float d = d_all * x[i].scales[n];
            y[0] = d * (v1[0] - 8);
            y[1] = d * (v2[0] - 8);
            y[2] = d * (v1[1] - 8);
            y[3] = d * (v2[1] - 8);
            y[4] = d * (v1[2] - 8);
            y[5] = d * (v2[2] - 8);
            y[6] = d * (v1[3] - 8); 
            y[7] = d * (v2[3] - 8);
            q += 4;
            y += 8;
        } 
    }
}           

Perplexity results

I have done some experiments with this idea for 2-, 3- and 4-bit quantization and the following table summarizes the perplexity results. All calculations are with output tensor kept as fp16, which adds about 200 MB to the size of the quantized model (compared to the output.weight tensor also being quantized):

Model Measure Q2_4 Q3_4 Q4_4
7B perplexity 8.3618 6.3559 6.1378
7B file size 2.65G 3.45G 4.2G
13B perplexity 6.7409 5.5110 5.2981
13B file size 4.95G 6.45G 8.0G

A few observations from the experiments and existing 4- and 5-bit results

  • At 4 and 5 bits, quantization of type x = m + d * q (QX_1, QX_3) performs better than x = d * m (QX_0, QX_2, and the QX_4 proposed here). This trend is reversed for 2- and 3-bit quantization. Especially for 2-bit quantization, Q2_1 and Q2_3 give basically useless results
  • There has been some work done for 2- and 3-bit quantization on this branch. The Q2_4 quantization proposed here gives much lower perplexity compared to what is reported there for Q2_2 (and my own experiment with Q2_2 gives a 7B perplexity of 10.6271 and 13B perplexity of 8.3552. The 30B Q2_2 perplexity of 6.9507 reported there is higher than the 13B Q2_4 perplexity found here ).
  • At 2-bit quantization, the difference between quantized and not quantized output tensor is significant (e.g., quantized output results in a 7B perplexity of 9.0087 vs 8.3618 from the above table. At 3-bit quantization the difference is much smaller (e.g., 6.4433 vs 6.3559 for 7B).
  • Q4_4 is better than Q4_0 and Q4_2, but the difference is much less compared to 2- and 3-bit quantizations
  • I have tried N = 8, 16, 32 (so "super-blocks" of 64, 128, 256 weights). Perplexity results remain effectively the same, while extra bits per weight (extra as in addition to the X quantization bits) change from 1.25 to 1.125 to 1.0625. Tensor sizes are divisible by 256 for all layers in the 7B and 13B models, so one could use this instead of the super-block size of 128 used here (this saves ~0.1G for the 13B model).

Here are the perplexity runs reported above:

Q2_4, 7B

main: seed = 1682671488
llama.cpp: loading model from ../models/7B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 4504.40 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.49 seconds per pass - ETA 16 minutes
[1]6.2670,[2]7.2397,[3]7.8484,[4]8.7113,[5]8.5541,[6]8.5304,[7]8.6772,[8]8.7266,[9]9.2108,[10]9.5711,[11]9.9331,[12]10.0286,[13]10.0075,[14]10.2435,[15]10.5844,[16]10.0359,[17]9.8055,[18]9.8147,[19]9.2709,[20]9.2379,[21]9.0826,[22]8.9060,[23]8.8751,[24]8.7826,[25]8.7974,[26]8.5861,[27]8.3354,[28]8.2662,[29]8.1377,[30]7.9415,[31]7.9138,[32]7.9309,[33]7.8544,[34]7.9014,[35]7.9400,[36]8.0232,[37]8.0318,[38]8.0524,[39]8.1139,[40]8.1847,[41]8.2215,[42]8.2709,[43]8.1977,[44]8.2695,[45]8.2579,[46]8.2205,[47]8.2469,[48]8.1920,[49]8.1907,[50]8.1222,[51]8.1247,[52]8.1002,[53]8.1535,[54]8.1312,[55]8.0808,[56]8.1320,[57]8.1640,[58]8.1951,[59]8.2061,[60]8.2674,[61]8.2525,[62]8.3396,[63]8.3781,[64]8.3872,[65]8.4551,[66]8.4626,[67]8.4924,[68]8.5125,[69]8.5512,[70]8.5979,[71]8.6292,[72]8.6747,[73]8.7532,[74]8.7463,[75]8.7566,[76]8.7672,[77]8.7906,[78]8.7662,[79]8.7960,[80]8.7820,[81]8.8039,[82]8.8159,[83]8.7322,[84]8.7181,[85]8.7079,[86]8.6686,[87]8.6050,[88]8.5635,[89]8.5359,[90]8.5171,[91]8.5563,[92]8.5568,[93]8.5615,[94]8.5647,[95]8.6043,[96]8.6056,[97]8.6059,[98]8.5980,[99]8.5716,[100]8.5662,[101]8.5971,[102]8.5883,[103]8.6169,[104]8.6284,[105]8.6296,[106]8.6542,[107]8.6555,[108]8.6706,[109]8.6619,[110]8.6569,[111]8.6789,[112]8.7064,[113]8.7181,[114]8.7197,[115]8.7335,[116]8.7271,[117]8.7349,[118]8.7694,[119]8.7968,[120]8.8443,[121]8.8727,[122]8.9000,[123]8.9478,[124]8.9703,[125]8.9524,[126]9.0055,[127]9.0459,[128]9.0799,[129]9.0528,[130]9.0613,[131]9.0543,[132]9.0445,[133]9.0327,[134]9.0539,[135]9.0482,[136]9.0383,[137]9.0256,[138]9.0123,[139]9.0012,[140]9.0007,[141]8.9808,[142]8.9746,[143]8.9590,[144]8.9421,[145]8.9386,[146]8.9224,[147]8.9316,[148]8.9294,[149]8.9273,[150]8.9260,[151]8.9272,[152]8.9072,[153]8.8795,[154]8.8652,[155]8.8713,[156]8.8626,[157]8.8816,[158]8.8812,[159]8.8950,[160]8.8969,[161]8.9131,[162]8.8704,[163]8.8516,[164]8.8103,[165]8.7617,[166]8.7196,[167]8.6616,[168]8.6151,[169]8.5920,[170]8.5716,[171]8.5289,[172]8.4991,[173]8.4753,[174]8.4339,[175]8.4022,[176]8.3800,[177]8.3518,[178]8.3213,[179]8.2960,[180]8.2789,[181]8.2471,[182]8.2158,[183]8.1923,[184]8.1901,[185]8.1759,[186]8.1755,[187]8.1800,[188]8.1783,[189]8.2022,[190]8.2040,[191]8.2313,[192]8.2490,[193]8.2748,[194]8.2906,[195]8.3192,[196]8.3382,[197]8.3638,[198]8.3830,[199]8.3825,[200]8.3868,[201]8.3827,[202]8.4192,[203]8.4279,[204]8.4391,[205]8.4521,[206]8.4600,[207]8.4536,[208]8.4630,[209]8.4689,[210]8.4726,[211]8.4873,[212]8.4963,[213]8.5092,[214]8.5160,[215]8.5206,[216]8.5372,[217]8.5580,[218]8.5740,[219]8.5719,[220]8.5640,[221]8.5553,[222]8.5496,[223]8.5326,[224]8.5205,[225]8.5155,[226]8.5383,[227]8.5536,[228]8.5615,[229]8.5653,[230]8.5606,[231]8.5816,[232]8.5697,[233]8.5431,[234]8.5208,[235]8.5105,[236]8.4995,[237]8.4836,[238]8.4880,[239]8.4663,[240]8.4513,[241]8.4579,[242]8.4639,[243]8.4602,[244]8.4461,[245]8.4448,[246]8.4290,[247]8.4139,[248]8.4028,[249]8.3996,[250]8.4048,[251]8.3959,[252]8.3912,[253]8.3790,[254]8.3743,[255]8.3575,[256]8.3323,[257]8.3143,[258]8.3039,[259]8.3025,[260]8.2925,[261]8.2885,[262]8.2801,[263]8.2747,[264]8.2566,[265]8.2547,[266]8.2506,[267]8.2400,[268]8.2504,[269]8.2489,[270]8.2463,[271]8.2539,[272]8.2614,[273]8.2570,[274]8.2603,[275]8.2742,[276]8.2809,[277]8.3035,[278]8.3175,[279]8.3288,[280]8.3324,[281]8.3439,[282]8.3494,[283]8.3672,[284]8.3763,[285]8.3869,[286]8.4031,[287]8.4049,[288]8.4160,[289]8.4034,[290]8.3824,[291]8.3640,[292]8.3433,[293]8.3263,[294]8.3285,[295]8.3263,[296]8.3322,[297]8.3320,[298]8.3405,[299]8.3354,[300]8.3234,[301]8.3182,[302]8.3099,[303]8.2992,[304]8.2859,[305]8.2847,[306]8.2684,[307]8.2696,[308]8.2736,[309]8.2506,[310]8.2432,[311]8.2368,[312]8.2386,[313]8.2294,[314]8.2276,[315]8.2045,[316]8.2053,[317]8.1839,[318]8.1579,[319]8.1772,[320]8.1937,[321]8.2004,[322]8.1927,[323]8.1903,[324]8.1923,[325]8.2089,[326]8.2077,[327]8.2130,[328]8.2180,[329]8.2283,[330]8.2368,[331]8.2554,[332]8.2503,[333]8.2620,[334]8.2542,[335]8.2433,[336]8.2464,[337]8.2399,[338]8.2423,[339]8.2345,[340]8.2289,[341]8.2386,[342]8.2404,[343]8.2479,[344]8.2476,[345]8.2446,[346]8.2377,[347]8.2412,[348]8.2458,[349]8.2465,[350]8.2403,[351]8.2399,[352]8.2413,[353]8.2316,[354]8.2341,[355]8.2429,[356]8.2474,[357]8.2407,[358]8.2533,[359]8.2570,[360]8.2481,[361]8.2455,[362]8.2539,[363]8.2650,[364]8.2735,[365]8.2813,[366]8.2828,[367]8.2931,[368]8.2887,[369]8.2886,[370]8.2892,[371]8.2801,[372]8.2848,[373]8.2920,[374]8.2879,[375]8.2865,[376]8.2957,[377]8.2873,[378]8.2901,[379]8.2977,[380]8.2848,[381]8.2801,[382]8.2740,[383]8.2709,[384]8.2678,[385]8.2664,[386]8.2672,[387]8.2645,[388]8.2579,[389]8.2485,[390]8.2391,[391]8.2277,[392]8.2256,[393]8.2288,[394]8.2330,[395]8.2318,[396]8.2207,[397]8.2295,[398]8.2333,[399]8.2437,[400]8.2453,[401]8.2475,[402]8.2493,[403]8.2497,[404]8.2573,[405]8.2502,[406]8.2458,[407]8.2465,[408]8.2466,[409]8.2624,[410]8.2776,[411]8.2933,[412]8.3162,[413]8.3303,[414]8.3414,[415]8.3478,[416]8.3594,[417]8.3760,[418]8.3834,[419]8.3934,[420]8.4054,[421]8.4215,[422]8.4264,[423]8.4389,[424]8.4543,[425]8.4667,[426]8.4751,[427]8.4789,[428]8.4899,[429]8.4956,[430]8.5069,[431]8.5263,[432]8.5289,[433]8.5255,[434]8.5161,[435]8.5146,[436]8.5163,[437]8.5286,[438]8.5396,[439]8.5341,[440]8.5309,[441]8.5231,[442]8.5200,[443]8.5216,[444]8.5225,[445]8.5190,[446]8.5207,[447]8.5240,[448]8.5283,[449]8.5239,[450]8.5232,[451]8.5163,[452]8.5112,[453]8.5021,[454]8.4972,[455]8.4964,[456]8.5014,[457]8.5044,[458]8.5017,[459]8.5020,[460]8.5125,[461]8.5089,[462]8.5070,[463]8.5138,[464]8.5136,[465]8.5099,[466]8.5021,[467]8.5045,[468]8.5073,[469]8.5106,[470]8.5116,[471]8.5045,[472]8.5100,[473]8.5005,[474]8.5033,[475]8.5011,[476]8.5043,[477]8.4954,[478]8.4967,[479]8.5104,[480]8.5171,[481]8.5200,[482]8.5139,[483]8.5089,[484]8.5131,[485]8.5129,[486]8.5049,[487]8.5066,[488]8.5061,[489]8.4980,[490]8.4952,[491]8.4915,[492]8.4829,[493]8.4796,[494]8.4758,[495]8.4773,[496]8.4722,[497]8.4668,[498]8.4663,[499]8.4568,[500]8.4459,[501]8.4388,[502]8.4402,[503]8.4385,[504]8.4277,[505]8.4303,[506]8.4317,[507]8.4305,[508]8.4251,[509]8.4240,[510]8.4299,[511]8.4356,[512]8.4377,[513]8.4391,[514]8.4473,[515]8.4395,[516]8.4386,[517]8.4401,[518]8.4385,[519]8.4423,[520]8.4454,[521]8.4476,[522]8.4521,[523]8.4524,[524]8.4590,[525]8.4639,[526]8.4657,[527]8.4686,[528]8.4647,[529]8.4674,[530]8.4580,[531]8.4541,[532]8.4608,[533]8.4632,[534]8.4588,[535]8.4638,[536]8.4558,[537]8.4510,[538]8.4575,[539]8.4578,[540]8.4657,[541]8.4691,[542]8.4701,[543]8.4711,[544]8.4726,[545]8.4708,[546]8.4720,[547]8.4648,[548]8.4550,[549]8.4551,[550]8.4508,[551]8.4454,[552]8.4420,[553]8.4362,[554]8.4316,[555]8.4256,[556]8.4259,[557]8.4311,[558]8.4275,[559]8.4277,[560]8.4264,[561]8.4258,[562]8.4236,[563]8.4254,[564]8.4323,[565]8.4359,[566]8.4354,[567]8.4333,[568]8.4318,[569]8.4280,[570]8.4301,[571]8.4303,[572]8.4308,[573]8.4289,[574]8.4256,[575]8.4269,[576]8.4264,[577]8.4243,[578]8.4222,[579]8.4236,[580]8.4137,[581]8.4081,[582]8.4046,[583]8.4039,[584]8.4026,[585]8.3949,[586]8.3877,[587]8.3879,[588]8.3944,[589]8.4027,[590]8.4066,[591]8.4067,[592]8.4038,[593]8.3969,[594]8.3972,[595]8.3932,[596]8.3984,[597]8.3938,[598]8.3913,[599]8.3928,[600]8.3922,[601]8.3893,[602]8.3954,[603]8.3990,[604]8.4015,[605]8.4037,[606]8.4054,[607]8.4049,[608]8.3980,[609]8.3974,[610]8.4014,[611]8.3989,[612]8.4027,[613]8.3976,[614]8.3919,[615]8.3800,[616]8.3859,[617]8.3765,[618]8.3683,[619]8.3586,[620]8.3366,[621]8.3254,[622]8.3233,[623]8.3250,[624]8.3239,[625]8.3229,[626]8.3212,[627]8.3259,[628]8.3252,[629]8.3237,[630]8.3274,[631]8.3340,[632]8.3404,[633]8.3379,[634]8.3423,[635]8.3431,[636]8.3403,[637]8.3381,[638]8.3427,[639]8.3394,[640]8.3394,[641]8.3390,[642]8.3474,[643]8.3494,[644]8.3498,[645]8.3465,[646]8.3537,[647]8.3518,[648]8.3533,[649]8.3524,[650]8.3579,[651]8.3656,[652]8.3674,[653]8.3719,[654]8.3636,[655]8.3618,

llama_print_timings: load time = 2570.97 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 906622.46 ms / 335360 tokens ( 2.70 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 933921.98 ms

Q3_4, 7B

main: seed = 1682612164
llama.cpp: loading model from junk.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5390.48 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
13.33 seconds per pass - ETA 2 hours 25 minutes
[1]4.5663,[2]4.9408,[3]5.8361,[4]6.5915,[5]6.6755,[6]6.6236,[7]6.8095,[8]6.9148,[9]7.2707,[10]7.5192,[11]7.7399,[12]7.7851,[13]7.7344,[14]7.8086,[15]8.0569,[16]7.6602,[17]7.5283,[18]7.4705,[19]7.0917,[20]7.0792,[21]6.9849,[22]6.8041,[23]6.7713,[24]6.6866,[25]6.6918,[26]6.5324,[27]6.3450,[28]6.2465,[29]6.1521,[30]5.9987,[31]5.9758,[32]5.9998,[33]5.9411,[34]5.9724,[35]5.9995,[36]6.0390,[37]6.0437,[38]6.0577,[39]6.0941,[40]6.1552,[41]6.1714,[42]6.2174,[43]6.1721,[44]6.2245,[45]6.2294,[46]6.2025,[47]6.2265,[48]6.1973,[49]6.2013,[50]6.1552,[51]6.1511,[52]6.1385,[53]6.1806,[54]6.1639,[55]6.1354,[56]6.1712,[57]6.1937,[58]6.2181,[59]6.2339,[60]6.2785,[61]6.2698,[62]6.3279,[63]6.3618,[64]6.3756,[65]6.4226,[66]6.4309,[67]6.4517,[68]6.4697,[69]6.4930,[70]6.5255,[71]6.5479,[72]6.5798,[73]6.6429,[74]6.6461,[75]6.6590,[76]6.6751,[77]6.6892,[78]6.6736,[79]6.7003,[80]6.6948,[81]6.7049,[82]6.7107,[83]6.6550,[84]6.6395,[85]6.6251,[86]6.6019,[87]6.5435,[88]6.5150,[89]6.4959,[90]6.4806,[91]6.5057,[92]6.4994,[93]6.4989,[94]6.4935,[95]6.5215,[96]6.5199,[97]6.5145,[98]6.5099,[99]6.4929,[100]6.4931,[101]6.5206,[102]6.5163,[103]6.5351,[104]6.5432,[105]6.5422,[106]6.5577,[107]6.5536,[108]6.5665,[109]6.5615,[110]6.5573,[111]6.5786,[112]6.6020,[113]6.6021,[114]6.5969,[115]6.6038,[116]6.5944,[117]6.6001,[118]6.6283,[119]6.6498,[120]6.6840,[121]6.6992,[122]6.7240,[123]6.7622,[124]6.7804,[125]6.7694,[126]6.8078,[127]6.8453,[128]6.8757,[129]6.8576,[130]6.8657,[131]6.8604,[132]6.8510,[133]6.8362,[134]6.8459,[135]6.8407,[136]6.8283,[137]6.8197,[138]6.8047,[139]6.7936,[140]6.7900,[141]6.7636,[142]6.7598,[143]6.7322,[144]6.7116,[145]6.7051,[146]6.6920,[147]6.6975,[148]6.6982,[149]6.6923,[150]6.6885,[151]6.6909,[152]6.6807,[153]6.6650,[154]6.6550,[155]6.6620,[156]6.6565,[157]6.6743,[158]6.6784,[159]6.6818,[160]6.6830,[161]6.6951,[162]6.6634,[163]6.6521,[164]6.6259,[165]6.5926,[166]6.5632,[167]6.5231,[168]6.4907,[169]6.4767,[170]6.4657,[171]6.4369,[172]6.4188,[173]6.4011,[174]6.3693,[175]6.3458,[176]6.3342,[177]6.3129,[178]6.2889,[179]6.2715,[180]6.2615,[181]6.2389,[182]6.2196,[183]6.2037,[184]6.2019,[185]6.1936,[186]6.1946,[187]6.2012,[188]6.1977,[189]6.2168,[190]6.2182,[191]6.2409,[192]6.2573,[193]6.2741,[194]6.2860,[195]6.3075,[196]6.3233,[197]6.3455,[198]6.3612,[199]6.3641,[200]6.3691,[201]6.3647,[202]6.3864,[203]6.3947,[204]6.3980,[205]6.4091,[206]6.4162,[207]6.4121,[208]6.4209,[209]6.4261,[210]6.4310,[211]6.4415,[212]6.4490,[213]6.4591,[214]6.4633,[215]6.4680,[216]6.4832,[217]6.5014,[218]6.5150,[219]6.5162,[220]6.5118,[221]6.5051,[222]6.5023,[223]6.4909,[224]6.4835,[225]6.4790,[226]6.5006,[227]6.5076,[228]6.5134,[229]6.5182,[230]6.5144,[231]6.5315,[232]6.5184,[233]6.5008,[234]6.4849,[235]6.4693,[236]6.4610,[237]6.4500,[238]6.4531,[239]6.4369,[240]6.4258,[241]6.4290,[242]6.4326,[243]6.4305,[244]6.4181,[245]6.4153,[246]6.4041,[247]6.3923,[248]6.3846,[249]6.3822,[250]6.3861,[251]6.3801,[252]6.3766,[253]6.3666,[254]6.3631,[255]6.3511,[256]6.3323,[257]6.3202,[258]6.3114,[259]6.3098,[260]6.3011,[261]6.2972,[262]6.2914,[263]6.2865,[264]6.2683,[265]6.2681,[266]6.2659,[267]6.2593,[268]6.2693,[269]6.2675,[270]6.2675,[271]6.2760,[272]6.2799,[273]6.2786,[274]6.2805,[275]6.2891,[276]6.2950,[277]6.3109,[278]6.3215,[279]6.3307,[280]6.3332,[281]6.3427,[282]6.3475,[283]6.3624,[284]6.3708,[285]6.3798,[286]6.3937,[287]6.3928,[288]6.3992,[289]6.3897,[290]6.3734,[291]6.3572,[292]6.3419,[293]6.3274,[294]6.3303,[295]6.3298,[296]6.3349,[297]6.3338,[298]6.3372,[299]6.3346,[300]6.3234,[301]6.3228,[302]6.3155,[303]6.3066,[304]6.2975,[305]6.2948,[306]6.2815,[307]6.2830,[308]6.2856,[309]6.2690,[310]6.2629,[311]6.2567,[312]6.2591,[313]6.2540,[314]6.2524,[315]6.2360,[316]6.2325,[317]6.2155,[318]6.1942,[319]6.2070,[320]6.2197,[321]6.2236,[322]6.2183,[323]6.2123,[324]6.2093,[325]6.2197,[326]6.2195,[327]6.2218,[328]6.2254,[329]6.2315,[330]6.2353,[331]6.2479,[332]6.2454,[333]6.2528,[334]6.2471,[335]6.2406,[336]6.2445,[337]6.2410,[338]6.2406,[339]6.2349,[340]6.2300,[341]6.2388,[342]6.2411,[343]6.2469,[344]6.2470,[345]6.2466,[346]6.2438,[347]6.2479,[348]6.2522,[349]6.2543,[350]6.2511,[351]6.2515,[352]6.2519,[353]6.2461,[354]6.2467,[355]6.2520,[356]6.2548,[357]6.2507,[358]6.2604,[359]6.2628,[360]6.2577,[361]6.2571,[362]6.2634,[363]6.2747,[364]6.2805,[365]6.2861,[366]6.2869,[367]6.2959,[368]6.2936,[369]6.2947,[370]6.2959,[371]6.2899,[372]6.2945,[373]6.3000,[374]6.2986,[375]6.2982,[376]6.3058,[377]6.3004,[378]6.3028,[379]6.3093,[380]6.3012,[381]6.2973,[382]6.2922,[383]6.2908,[384]6.2898,[385]6.2888,[386]6.2885,[387]6.2885,[388]6.2839,[389]6.2783,[390]6.2714,[391]6.2634,[392]6.2598,[393]6.2582,[394]6.2612,[395]6.2601,[396]6.2523,[397]6.2594,[398]6.2624,[399]6.2696,[400]6.2688,[401]6.2714,[402]6.2727,[403]6.2746,[404]6.2816,[405]6.2732,[406]6.2692,[407]6.2692,[408]6.2711,[409]6.2829,[410]6.2948,[411]6.3068,[412]6.3233,[413]6.3350,[414]6.3431,[415]6.3488,[416]6.3577,[417]6.3705,[418]6.3745,[419]6.3817,[420]6.3908,[421]6.4034,[422]6.4078,[423]6.4150,[424]6.4265,[425]6.4360,[426]6.4423,[427]6.4469,[428]6.4554,[429]6.4599,[430]6.4688,[431]6.4829,[432]6.4860,[433]6.4848,[434]6.4798,[435]6.4800,[436]6.4816,[437]6.4912,[438]6.4991,[439]6.4958,[440]6.4950,[441]6.4895,[442]6.4880,[443]6.4888,[444]6.4892,[445]6.4875,[446]6.4892,[447]6.4916,[448]6.4960,[449]6.4937,[450]6.4942,[451]6.4897,[452]6.4792,[453]6.4706,[454]6.4647,[455]6.4659,[456]6.4703,[457]6.4721,[458]6.4701,[459]6.4703,[460]6.4789,[461]6.4761,[462]6.4740,[463]6.4784,[464]6.4771,[465]6.4748,[466]6.4669,[467]6.4671,[468]6.4666,[469]6.4686,[470]6.4690,[471]6.4642,[472]6.4688,[473]6.4631,[474]6.4643,[475]6.4586,[476]6.4605,[477]6.4535,[478]6.4529,[479]6.4602,[480]6.4651,[481]6.4669,[482]6.4626,[483]6.4583,[484]6.4605,[485]6.4590,[486]6.4533,[487]6.4536,[488]6.4514,[489]6.4462,[490]6.4439,[491]6.4405,[492]6.4345,[493]6.4318,[494]6.4300,[495]6.4294,[496]6.4262,[497]6.4207,[498]6.4187,[499]6.4142,[500]6.4045,[501]6.3977,[502]6.3982,[503]6.3975,[504]6.3887,[505]6.3914,[506]6.3922,[507]6.3870,[508]6.3831,[509]6.3822,[510]6.3861,[511]6.3908,[512]6.3944,[513]6.3960,[514]6.4024,[515]6.3970,[516]6.3961,[517]6.3972,[518]6.3967,[519]6.3997,[520]6.4024,[521]6.4039,[522]6.4071,[523]6.4077,[524]6.4132,[525]6.4167,[526]6.4175,[527]6.4192,[528]6.4137,[529]6.4147,[530]6.4096,[531]6.4080,[532]6.4132,[533]6.4158,[534]6.4138,[535]6.4163,[536]6.4104,[537]6.4078,[538]6.4132,[539]6.4142,[540]6.4182,[541]6.4187,[542]6.4199,[543]6.4211,[544]6.4221,[545]6.4199,[546]6.4205,[547]6.4159,[548]6.4104,[549]6.4102,[550]6.4072,[551]6.4035,[552]6.4016,[553]6.3975,[554]6.3949,[555]6.3915,[556]6.3912,[557]6.3940,[558]6.3902,[559]6.3897,[560]6.3893,[561]6.3894,[562]6.3868,[563]6.3866,[564]6.3912,[565]6.3934,[566]6.3934,[567]6.3908,[568]6.3910,[569]6.3895,[570]6.3927,[571]6.3928,[572]6.3939,[573]6.3937,[574]6.3899,[575]6.3895,[576]6.3898,[577]6.3881,[578]6.3859,[579]6.3864,[580]6.3795,[581]6.3757,[582]6.3744,[583]6.3751,[584]6.3752,[585]6.3680,[586]6.3610,[587]6.3616,[588]6.3663,[589]6.3722,[590]6.3754,[591]6.3775,[592]6.3755,[593]6.3719,[594]6.3725,[595]6.3698,[596]6.3737,[597]6.3711,[598]6.3683,[599]6.3702,[600]6.3694,[601]6.3680,[602]6.3704,[603]6.3732,[604]6.3745,[605]6.3776,[606]6.3799,[607]6.3788,[608]6.3751,[609]6.3755,[610]6.3790,[611]6.3770,[612]6.3797,[613]6.3758,[614]6.3703,[615]6.3627,[616]6.3654,[617]6.3589,[618]6.3537,[619]6.3478,[620]6.3331,[621]6.3260,[622]6.3239,[623]6.3254,[624]6.3260,[625]6.3260,[626]6.3248,[627]6.3273,[628]6.3271,[629]6.3268,[630]6.3300,[631]6.3358,[632]6.3411,[633]6.3395,[634]6.3428,[635]6.3431,[636]6.3407,[637]6.3375,[638]6.3405,[639]6.3373,[640]6.3381,[641]6.3380,[642]6.3446,[643]6.3467,[644]6.3480,[645]6.3463,[646]6.3508,[647]6.3474,[648]6.3484,[649]6.3487,[650]6.3527,[651]6.3582,[652]6.3594,[653]6.3633,[654]6.3566,[655]6.3559,

llama_print_timings: load time = 13794.33 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 4708961.29 ms / 335360 tokens ( 14.04 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 4740083.96 ms

Q4_4, 7B

main: seed = 1682662628
llama.cpp: loading model from ../models/7B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 6079.65 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.66 seconds per pass - ETA 18 minutes
[1]4.4671,[2]4.9332,[3]5.7966,[4]6.3903,[5]6.4889,[6]6.4445,[7]6.6354,[8]6.7448,[9]7.1008,[10]7.3327,[11]7.5469,[12]7.5702,[13]7.4939,[14]7.5431,[15]7.8020,[16]7.4115,[17]7.2997,[18]7.2633,[19]6.8989,[20]6.8895,[21]6.7942,[22]6.6200,[23]6.5946,[24]6.4943,[25]6.4910,[26]6.3306,[27]6.1518,[28]6.0550,[29]5.9628,[30]5.7997,[31]5.7669,[32]5.7880,[33]5.7247,[34]5.7606,[35]5.7797,[36]5.8213,[37]5.8266,[38]5.8396,[39]5.8747,[40]5.9274,[41]5.9369,[42]5.9771,[43]5.9365,[44]5.9939,[45]5.9977,[46]5.9743,[47]5.9951,[48]5.9683,[49]5.9721,[50]5.9325,[51]5.9288,[52]5.9188,[53]5.9631,[54]5.9476,[55]5.9211,[56]5.9503,[57]5.9739,[58]5.9943,[59]6.0098,[60]6.0524,[61]6.0460,[62]6.1021,[63]6.1367,[64]6.1507,[65]6.1955,[66]6.2021,[67]6.2187,[68]6.2364,[69]6.2616,[70]6.2919,[71]6.3106,[72]6.3415,[73]6.4011,[74]6.4065,[75]6.4199,[76]6.4314,[77]6.4421,[78]6.4259,[79]6.4530,[80]6.4448,[81]6.4560,[82]6.4602,[83]6.4082,[84]6.3919,[85]6.3798,[86]6.3574,[87]6.2929,[88]6.2639,[89]6.2441,[90]6.2288,[91]6.2508,[92]6.2444,[93]6.2448,[94]6.2434,[95]6.2717,[96]6.2711,[97]6.2662,[98]6.2604,[99]6.2463,[100]6.2476,[101]6.2717,[102]6.2658,[103]6.2852,[104]6.2931,[105]6.2930,[106]6.3089,[107]6.3065,[108]6.3193,[109]6.3124,[110]6.3074,[111]6.3307,[112]6.3508,[113]6.3523,[114]6.3484,[115]6.3549,[116]6.3466,[117]6.3512,[118]6.3800,[119]6.3996,[120]6.4352,[121]6.4512,[122]6.4768,[123]6.5140,[124]6.5320,[125]6.5220,[126]6.5619,[127]6.5989,[128]6.6290,[129]6.6127,[130]6.6231,[131]6.6199,[132]6.6103,[133]6.5962,[134]6.6053,[135]6.6013,[136]6.5899,[137]6.5826,[138]6.5660,[139]6.5555,[140]6.5498,[141]6.5198,[142]6.5166,[143]6.4876,[144]6.4677,[145]6.4586,[146]6.4460,[147]6.4515,[148]6.4512,[149]6.4450,[150]6.4413,[151]6.4425,[152]6.4321,[153]6.4147,[154]6.4056,[155]6.4131,[156]6.4087,[157]6.4255,[158]6.4298,[159]6.4351,[160]6.4369,[161]6.4483,[162]6.4189,[163]6.4069,[164]6.3823,[165]6.3515,[166]6.3244,[167]6.2874,[168]6.2551,[169]6.2412,[170]6.2300,[171]6.2027,[172]6.1858,[173]6.1688,[174]6.1387,[175]6.1174,[176]6.1069,[177]6.0861,[178]6.0630,[179]6.0461,[180]6.0368,[181]6.0151,[182]5.9970,[183]5.9832,[184]5.9832,[185]5.9755,[186]5.9762,[187]5.9828,[188]5.9785,[189]5.9951,[190]5.9961,[191]6.0184,[192]6.0350,[193]6.0522,[194]6.0635,[195]6.0853,[196]6.1015,[197]6.1229,[198]6.1381,[199]6.1415,[200]6.1468,[201]6.1411,[202]6.1608,[203]6.1688,[204]6.1675,[205]6.1781,[206]6.1847,[207]6.1806,[208]6.1893,[209]6.1939,[210]6.1997,[211]6.2106,[212]6.2183,[213]6.2291,[214]6.2316,[215]6.2350,[216]6.2496,[217]6.2681,[218]6.2813,[219]6.2813,[220]6.2777,[221]6.2732,[222]6.2704,[223]6.2607,[224]6.2532,[225]6.2496,[226]6.2706,[227]6.2796,[228]6.2846,[229]6.2911,[230]6.2883,[231]6.3053,[232]6.2927,[233]6.2763,[234]6.2615,[235]6.2437,[236]6.2364,[237]6.2263,[238]6.2291,[239]6.2136,[240]6.2034,[241]6.2061,[242]6.2098,[243]6.2076,[244]6.1963,[245]6.1933,[246]6.1819,[247]6.1698,[248]6.1622,[249]6.1601,[250]6.1644,[251]6.1573,[252]6.1532,[253]6.1435,[254]6.1388,[255]6.1269,[256]6.1089,[257]6.0971,[258]6.0891,[259]6.0872,[260]6.0795,[261]6.0754,[262]6.0699,[263]6.0643,[264]6.0427,[265]6.0420,[266]6.0407,[267]6.0339,[268]6.0434,[269]6.0410,[270]6.0421,[271]6.0497,[272]6.0532,[273]6.0534,[274]6.0557,[275]6.0640,[276]6.0700,[277]6.0856,[278]6.0958,[279]6.1049,[280]6.1076,[281]6.1168,[282]6.1228,[283]6.1381,[284]6.1461,[285]6.1548,[286]6.1684,[287]6.1687,[288]6.1747,[289]6.1662,[290]6.1505,[291]6.1353,[292]6.1199,[293]6.1070,[294]6.1090,[295]6.1082,[296]6.1124,[297]6.1110,[298]6.1144,[299]6.1116,[300]6.1005,[301]6.1005,[302]6.0925,[303]6.0832,[304]6.0749,[305]6.0724,[306]6.0599,[307]6.0621,[308]6.0658,[309]6.0495,[310]6.0435,[311]6.0375,[312]6.0401,[313]6.0346,[314]6.0328,[315]6.0165,[316]6.0113,[317]5.9953,[318]5.9745,[319]5.9864,[320]5.9991,[321]6.0034,[322]5.9992,[323]5.9924,[324]5.9893,[325]5.9993,[326]5.9989,[327]6.0011,[328]6.0052,[329]6.0116,[330]6.0142,[331]6.0267,[332]6.0234,[333]6.0306,[334]6.0251,[335]6.0182,[336]6.0215,[337]6.0188,[338]6.0183,[339]6.0132,[340]6.0088,[341]6.0169,[342]6.0194,[343]6.0240,[344]6.0238,[345]6.0243,[346]6.0216,[347]6.0255,[348]6.0282,[349]6.0300,[350]6.0266,[351]6.0272,[352]6.0275,[353]6.0218,[354]6.0218,[355]6.0268,[356]6.0295,[357]6.0262,[358]6.0353,[359]6.0384,[360]6.0350,[361]6.0349,[362]6.0416,[363]6.0531,[364]6.0599,[365]6.0656,[366]6.0665,[367]6.0749,[368]6.0722,[369]6.0729,[370]6.0744,[371]6.0687,[372]6.0733,[373]6.0784,[374]6.0768,[375]6.0770,[376]6.0837,[377]6.0790,[378]6.0816,[379]6.0876,[380]6.0798,[381]6.0762,[382]6.0711,[383]6.0704,[384]6.0698,[385]6.0690,[386]6.0684,[387]6.0682,[388]6.0644,[389]6.0591,[390]6.0524,[391]6.0446,[392]6.0405,[393]6.0391,[394]6.0420,[395]6.0406,[396]6.0330,[397]6.0401,[398]6.0440,[399]6.0523,[400]6.0522,[401]6.0539,[402]6.0546,[403]6.0566,[404]6.0632,[405]6.0534,[406]6.0498,[407]6.0490,[408]6.0505,[409]6.0626,[410]6.0736,[411]6.0848,[412]6.1006,[413]6.1118,[414]6.1195,[415]6.1248,[416]6.1322,[417]6.1444,[418]6.1483,[419]6.1556,[420]6.1642,[421]6.1757,[422]6.1804,[423]6.1873,[424]6.1986,[425]6.2072,[426]6.2136,[427]6.2181,[428]6.2262,[429]6.2315,[430]6.2398,[431]6.2542,[432]6.2585,[433]6.2580,[434]6.2538,[435]6.2546,[436]6.2568,[437]6.2665,[438]6.2738,[439]6.2713,[440]6.2704,[441]6.2652,[442]6.2637,[443]6.2651,[444]6.2653,[445]6.2633,[446]6.2660,[447]6.2689,[448]6.2734,[449]6.2704,[450]6.2717,[451]6.2676,[452]6.2548,[453]6.2464,[454]6.2409,[455]6.2419,[456]6.2463,[457]6.2483,[458]6.2461,[459]6.2467,[460]6.2551,[461]6.2525,[462]6.2511,[463]6.2558,[464]6.2547,[465]6.2519,[466]6.2441,[467]6.2441,[468]6.2439,[469]6.2458,[470]6.2462,[471]6.2412,[472]6.2457,[473]6.2402,[474]6.2412,[475]6.2353,[476]6.2371,[477]6.2300,[478]6.2289,[479]6.2348,[480]6.2394,[481]6.2412,[482]6.2367,[483]6.2323,[484]6.2343,[485]6.2332,[486]6.2278,[487]6.2278,[488]6.2258,[489]6.2209,[490]6.2185,[491]6.2154,[492]6.2094,[493]6.2065,[494]6.2051,[495]6.2051,[496]6.2017,[497]6.1960,[498]6.1943,[499]6.1898,[500]6.1803,[501]6.1738,[502]6.1741,[503]6.1733,[504]6.1644,[505]6.1673,[506]6.1681,[507]6.1625,[508]6.1586,[509]6.1579,[510]6.1617,[511]6.1661,[512]6.1694,[513]6.1715,[514]6.1779,[515]6.1723,[516]6.1713,[517]6.1722,[518]6.1721,[519]6.1750,[520]6.1777,[521]6.1792,[522]6.1821,[523]6.1827,[524]6.1884,[525]6.1919,[526]6.1931,[527]6.1949,[528]6.1897,[529]6.1904,[530]6.1854,[531]6.1840,[532]6.1885,[533]6.1907,[534]6.1894,[535]6.1918,[536]6.1864,[537]6.1840,[538]6.1889,[539]6.1900,[540]6.1936,[541]6.1938,[542]6.1950,[543]6.1967,[544]6.1976,[545]6.1955,[546]6.1963,[547]6.1920,[548]6.1871,[549]6.1872,[550]6.1841,[551]6.1805,[552]6.1783,[553]6.1745,[554]6.1723,[555]6.1694,[556]6.1691,[557]6.1713,[558]6.1675,[559]6.1669,[560]6.1667,[561]6.1668,[562]6.1641,[563]6.1641,[564]6.1682,[565]6.1701,[566]6.1698,[567]6.1678,[568]6.1683,[569]6.1668,[570]6.1696,[571]6.1702,[572]6.1711,[573]6.1712,[574]6.1677,[575]6.1675,[576]6.1675,[577]6.1662,[578]6.1642,[579]6.1649,[580]6.1582,[581]6.1544,[582]6.1534,[583]6.1543,[584]6.1544,[585]6.1467,[586]6.1399,[587]6.1404,[588]6.1449,[589]6.1505,[590]6.1536,[591]6.1558,[592]6.1545,[593]6.1514,[594]6.1523,[595]6.1500,[596]6.1535,[597]6.1513,[598]6.1484,[599]6.1506,[600]6.1502,[601]6.1486,[602]6.1500,[603]6.1533,[604]6.1542,[605]6.1574,[606]6.1593,[607]6.1577,[608]6.1546,[609]6.1551,[610]6.1587,[611]6.1569,[612]6.1595,[613]6.1557,[614]6.1506,[615]6.1432,[616]6.1462,[617]6.1402,[618]6.1353,[619]6.1297,[620]6.1158,[621]6.1088,[622]6.1073,[623]6.1087,[624]6.1091,[625]6.1091,[626]6.1078,[627]6.1098,[628]6.1099,[629]6.1096,[630]6.1128,[631]6.1183,[632]6.1237,[633]6.1221,[634]6.1256,[635]6.1265,[636]6.1232,[637]6.1200,[638]6.1227,[639]6.1197,[640]6.1206,[641]6.1210,[642]6.1278,[643]6.1300,[644]6.1312,[645]6.1292,[646]6.1331,[647]6.1294,[648]6.1302,[649]6.1303,[650]6.1343,[651]6.1398,[652]6.1408,[653]6.1448,[654]6.1384,[655]6.1378,

llama_print_timings: load time = 2868.41 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 986629.03 ms / 335360 tokens ( 2.94 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1016443.58 ms

Q2_4, 13B

main: seed = 1682672513
llama.cpp: loading model from ../models/13B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 7149.75 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.46 seconds per pass - ETA 26 minutes
[1]4.7585,[2]5.3449,[3]6.1921,[4]6.9859,[5]7.0915,[6]6.9582,[7]7.1905,[8]7.3194,[9]7.6847,[10]7.9856,[11]8.2037,[12]8.2346,[13]8.2531,[14]8.4211,[15]8.6654,[16]8.1885,[17]8.0491,[18]8.0571,[19]7.6317,[20]7.5630,[21]7.4568,[22]7.2688,[23]7.2103,[24]7.1009,[25]7.0970,[26]6.8966,[27]6.6696,[28]6.5677,[29]6.4672,[30]6.2927,[31]6.2522,[32]6.2650,[33]6.2214,[34]6.2911,[35]6.3188,[36]6.3626,[37]6.3676,[38]6.3648,[39]6.4049,[40]6.4696,[41]6.5031,[42]6.5451,[43]6.4929,[44]6.5349,[45]6.5355,[46]6.4898,[47]6.5203,[48]6.4929,[49]6.5029,[50]6.4658,[51]6.4713,[52]6.4578,[53]6.5058,[54]6.4917,[55]6.4656,[56]6.4997,[57]6.5208,[58]6.5559,[59]6.5796,[60]6.6247,[61]6.6097,[62]6.6783,[63]6.7118,[64]6.7189,[65]6.7631,[66]6.7637,[67]6.7817,[68]6.7975,[69]6.8369,[70]6.8757,[71]6.9031,[72]6.9442,[73]7.0056,[74]7.0067,[75]7.0174,[76]7.0390,[77]7.0566,[78]7.0475,[79]7.0756,[80]7.0690,[81]7.0858,[82]7.0833,[83]7.0255,[84]7.0156,[85]7.0123,[86]6.9888,[87]6.9278,[88]6.8928,[89]6.8682,[90]6.8598,[91]6.8908,[92]6.8824,[93]6.8846,[94]6.8808,[95]6.9110,[96]6.9089,[97]6.9063,[98]6.8987,[99]6.8890,[100]6.8806,[101]6.9075,[102]6.8944,[103]6.9125,[104]6.9135,[105]6.9150,[106]6.9329,[107]6.9309,[108]6.9469,[109]6.9416,[110]6.9362,[111]6.9575,[112]6.9752,[113]6.9789,[114]6.9764,[115]6.9810,[116]6.9712,[117]6.9763,[118]7.0062,[119]7.0290,[120]7.0601,[121]7.0781,[122]7.1003,[123]7.1417,[124]7.1628,[125]7.1536,[126]7.1928,[127]7.2300,[128]7.2598,[129]7.2410,[130]7.2510,[131]7.2454,[132]7.2380,[133]7.2288,[134]7.2421,[135]7.2391,[136]7.2287,[137]7.2241,[138]7.2107,[139]7.2015,[140]7.2012,[141]7.1776,[142]7.1732,[143]7.1527,[144]7.1378,[145]7.1344,[146]7.1172,[147]7.1279,[148]7.1340,[149]7.1299,[150]7.1284,[151]7.1312,[152]7.1194,[153]7.1053,[154]7.0951,[155]7.1006,[156]7.0991,[157]7.1170,[158]7.1222,[159]7.1258,[160]7.1297,[161]7.1436,[162]7.1072,[163]7.0966,[164]7.0683,[165]7.0330,[166]6.9992,[167]6.9570,[168]6.9234,[169]6.9088,[170]6.8939,[171]6.8657,[172]6.8461,[173]6.8297,[174]6.7949,[175]6.7696,[176]6.7531,[177]6.7300,[178]6.7037,[179]6.6849,[180]6.6748,[181]6.6528,[182]6.6307,[183]6.6164,[184]6.6141,[185]6.6069,[186]6.6110,[187]6.6160,[188]6.6144,[189]6.6356,[190]6.6369,[191]6.6560,[192]6.6695,[193]6.6896,[194]6.7040,[195]6.7267,[196]6.7426,[197]6.7660,[198]6.7803,[199]6.7816,[200]6.7834,[201]6.7782,[202]6.8005,[203]6.8093,[204]6.8139,[205]6.8266,[206]6.8314,[207]6.8280,[208]6.8349,[209]6.8376,[210]6.8433,[211]6.8518,[212]6.8573,[213]6.8660,[214]6.8694,[215]6.8724,[216]6.8841,[217]6.9016,[218]6.9167,[219]6.9157,[220]6.9105,[221]6.9024,[222]6.9009,[223]6.8909,[224]6.8810,[225]6.8771,[226]6.8996,[227]6.9145,[228]6.9243,[229]6.9329,[230]6.9299,[231]6.9460,[232]6.9357,[233]6.9161,[234]6.8990,[235]6.8808,[236]6.8716,[237]6.8604,[238]6.8644,[239]6.8478,[240]6.8351,[241]6.8392,[242]6.8429,[243]6.8411,[244]6.8283,[245]6.8257,[246]6.8136,[247]6.8008,[248]6.7913,[249]6.7872,[250]6.7898,[251]6.7811,[252]6.7757,[253]6.7633,[254]6.7598,[255]6.7463,[256]6.7255,[257]6.7133,[258]6.7034,[259]6.7016,[260]6.6927,[261]6.6882,[262]6.6811,[263]6.6731,[264]6.6577,[265]6.6581,[266]6.6546,[267]6.6455,[268]6.6556,[269]6.6559,[270]6.6559,[271]6.6630,[272]6.6677,[273]6.6665,[274]6.6678,[275]6.6755,[276]6.6830,[277]6.7003,[278]6.7100,[279]6.7188,[280]6.7223,[281]6.7342,[282]6.7384,[283]6.7524,[284]6.7615,[285]6.7707,[286]6.7866,[287]6.7841,[288]6.7917,[289]6.7842,[290]6.7667,[291]6.7510,[292]6.7324,[293]6.7158,[294]6.7170,[295]6.7174,[296]6.7228,[297]6.7217,[298]6.7237,[299]6.7202,[300]6.7097,[301]6.7079,[302]6.6990,[303]6.6904,[304]6.6805,[305]6.6760,[306]6.6626,[307]6.6652,[308]6.6659,[309]6.6506,[310]6.6450,[311]6.6401,[312]6.6418,[313]6.6342,[314]6.6339,[315]6.6170,[316]6.6160,[317]6.6004,[318]6.5804,[319]6.5945,[320]6.6077,[321]6.6118,[322]6.6055,[323]6.5989,[324]6.5969,[325]6.6085,[326]6.6091,[327]6.6111,[328]6.6139,[329]6.6186,[330]6.6225,[331]6.6357,[332]6.6312,[333]6.6398,[334]6.6321,[335]6.6254,[336]6.6294,[337]6.6266,[338]6.6272,[339]6.6223,[340]6.6194,[341]6.6274,[342]6.6307,[343]6.6368,[344]6.6358,[345]6.6355,[346]6.6318,[347]6.6356,[348]6.6404,[349]6.6428,[350]6.6401,[351]6.6418,[352]6.6438,[353]6.6379,[354]6.6392,[355]6.6448,[356]6.6477,[357]6.6436,[358]6.6529,[359]6.6557,[360]6.6506,[361]6.6486,[362]6.6565,[363]6.6678,[364]6.6738,[365]6.6800,[366]6.6819,[367]6.6929,[368]6.6892,[369]6.6907,[370]6.6922,[371]6.6857,[372]6.6918,[373]6.6976,[374]6.6955,[375]6.6940,[376]6.7022,[377]6.6962,[378]6.6974,[379]6.7035,[380]6.6939,[381]6.6905,[382]6.6856,[383]6.6836,[384]6.6840,[385]6.6822,[386]6.6813,[387]6.6816,[388]6.6758,[389]6.6701,[390]6.6638,[391]6.6557,[392]6.6529,[393]6.6534,[394]6.6558,[395]6.6539,[396]6.6467,[397]6.6557,[398]6.6598,[399]6.6702,[400]6.6694,[401]6.6701,[402]6.6706,[403]6.6732,[404]6.6793,[405]6.6683,[406]6.6646,[407]6.6647,[408]6.6652,[409]6.6781,[410]6.6897,[411]6.7017,[412]6.7187,[413]6.7319,[414]6.7406,[415]6.7476,[416]6.7561,[417]6.7672,[418]6.7697,[419]6.7761,[420]6.7854,[421]6.7970,[422]6.8007,[423]6.8079,[424]6.8207,[425]6.8308,[426]6.8387,[427]6.8423,[428]6.8514,[429]6.8557,[430]6.8635,[431]6.8784,[432]6.8802,[433]6.8788,[434]6.8730,[435]6.8730,[436]6.8755,[437]6.8857,[438]6.8954,[439]6.8912,[440]6.8895,[441]6.8831,[442]6.8802,[443]6.8809,[444]6.8829,[445]6.8802,[446]6.8811,[447]6.8832,[448]6.8871,[449]6.8847,[450]6.8841,[451]6.8796,[452]6.8733,[453]6.8643,[454]6.8579,[455]6.8578,[456]6.8623,[457]6.8644,[458]6.8620,[459]6.8618,[460]6.8698,[461]6.8655,[462]6.8631,[463]6.8662,[464]6.8657,[465]6.8642,[466]6.8564,[467]6.8586,[468]6.8591,[469]6.8619,[470]6.8626,[471]6.8578,[472]6.8629,[473]6.8565,[474]6.8588,[475]6.8554,[476]6.8561,[477]6.8481,[478]6.8469,[479]6.8548,[480]6.8607,[481]6.8621,[482]6.8569,[483]6.8536,[484]6.8568,[485]6.8558,[486]6.8488,[487]6.8492,[488]6.8468,[489]6.8411,[490]6.8390,[491]6.8360,[492]6.8294,[493]6.8257,[494]6.8236,[495]6.8228,[496]6.8191,[497]6.8134,[498]6.8116,[499]6.8061,[500]6.7967,[501]6.7880,[502]6.7892,[503]6.7873,[504]6.7777,[505]6.7790,[506]6.7806,[507]6.7771,[508]6.7734,[509]6.7717,[510]6.7750,[511]6.7811,[512]6.7847,[513]6.7874,[514]6.7947,[515]6.7885,[516]6.7874,[517]6.7884,[518]6.7876,[519]6.7900,[520]6.7920,[521]6.7937,[522]6.7953,[523]6.7954,[524]6.8017,[525]6.8047,[526]6.8057,[527]6.8079,[528]6.8026,[529]6.8051,[530]6.7994,[531]6.7978,[532]6.8046,[533]6.8084,[534]6.8061,[535]6.8101,[536]6.8043,[537]6.8016,[538]6.8073,[539]6.8080,[540]6.8123,[541]6.8142,[542]6.8147,[543]6.8169,[544]6.8180,[545]6.8166,[546]6.8171,[547]6.8123,[548]6.8059,[549]6.8058,[550]6.8032,[551]6.7988,[552]6.7972,[553]6.7924,[554]6.7894,[555]6.7866,[556]6.7858,[557]6.7887,[558]6.7852,[559]6.7856,[560]6.7835,[561]6.7842,[562]6.7818,[563]6.7808,[564]6.7860,[565]6.7882,[566]6.7879,[567]6.7853,[568]6.7853,[569]6.7822,[570]6.7854,[571]6.7860,[572]6.7865,[573]6.7865,[574]6.7829,[575]6.7816,[576]6.7811,[577]6.7778,[578]6.7755,[579]6.7751,[580]6.7677,[581]6.7637,[582]6.7634,[583]6.7635,[584]6.7630,[585]6.7563,[586]6.7496,[587]6.7503,[588]6.7555,[589]6.7622,[590]6.7651,[591]6.7657,[592]6.7647,[593]6.7611,[594]6.7620,[595]6.7595,[596]6.7636,[597]6.7606,[598]6.7572,[599]6.7593,[600]6.7579,[601]6.7563,[602]6.7597,[603]6.7628,[604]6.7645,[605]6.7670,[606]6.7681,[607]6.7671,[608]6.7631,[609]6.7631,[610]6.7687,[611]6.7669,[612]6.7689,[613]6.7652,[614]6.7594,[615]6.7506,[616]6.7538,[617]6.7459,[618]6.7394,[619]6.7332,[620]6.7181,[621]6.7112,[622]6.7089,[623]6.7105,[624]6.7107,[625]6.7115,[626]6.7104,[627]6.7136,[628]6.7136,[629]6.7137,[630]6.7170,[631]6.7235,[632]6.7290,[633]6.7270,[634]6.7302,[635]6.7294,[636]6.7265,[637]6.7236,[638]6.7265,[639]6.7229,[640]6.7237,[641]6.7239,[642]6.7309,[643]6.7328,[644]6.7346,[645]6.7328,[646]6.7373,[647]6.7336,[648]6.7350,[649]6.7353,[650]6.7393,[651]6.7443,[652]6.7446,[653]6.7485,[654]6.7419,[655]6.7409,

llama_print_timings: load time = 4488.53 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1516595.65 ms / 335360 tokens ( 4.52 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1547168.54 ms

Q3_4, 13B

main: seed = 1682656187
llama.cpp: loading model from ../models/13B/q34.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 8681.78 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.75 seconds per pass - ETA 30 minutes
[1]3.9691,[2]4.3609,[3]5.1508,[4]5.6357,[5]5.8245,[6]5.7682,[7]5.8930,[8]6.0160,[9]6.2782,[10]6.4992,[11]6.7043,[12]6.7592,[13]6.7085,[14]6.8204,[15]7.0162,[16]6.6686,[17]6.5717,[18]6.5474,[19]6.2354,[20]6.1972,[21]6.1250,[22]5.9506,[23]5.9279,[24]5.8369,[25]5.8490,[26]5.6949,[27]5.5136,[28]5.4184,[29]5.3375,[30]5.1977,[31]5.1640,[32]5.1797,[33]5.1311,[34]5.1711,[35]5.1947,[36]5.2175,[37]5.2144,[38]5.2114,[39]5.2423,[40]5.2872,[41]5.3119,[42]5.3487,[43]5.3108,[44]5.3543,[45]5.3571,[46]5.3300,[47]5.3585,[48]5.3381,[49]5.3477,[50]5.3142,[51]5.3187,[52]5.3117,[53]5.3565,[54]5.3451,[55]5.3237,[56]5.3481,[57]5.3650,[58]5.3886,[59]5.4044,[60]5.4383,[61]5.4305,[62]5.4858,[63]5.5112,[64]5.5220,[65]5.5596,[66]5.5598,[67]5.5773,[68]5.5892,[69]5.6185,[70]5.6480,[71]5.6698,[72]5.7054,[73]5.7564,[74]5.7630,[75]5.7734,[76]5.7888,[77]5.8019,[78]5.7873,[79]5.8141,[80]5.8085,[81]5.8180,[82]5.8151,[83]5.7677,[84]5.7567,[85]5.7513,[86]5.7341,[87]5.6712,[88]5.6305,[89]5.6089,[90]5.5978,[91]5.6195,[92]5.6135,[93]5.6146,[94]5.6130,[95]5.6419,[96]5.6394,[97]5.6359,[98]5.6316,[99]5.6235,[100]5.6211,[101]5.6452,[102]5.6405,[103]5.6557,[104]5.6603,[105]5.6634,[106]5.6775,[107]5.6761,[108]5.6907,[109]5.6889,[110]5.6830,[111]5.7020,[112]5.7191,[113]5.7178,[114]5.7158,[115]5.7198,[116]5.7080,[117]5.7090,[118]5.7330,[119]5.7506,[120]5.7808,[121]5.7966,[122]5.8185,[123]5.8563,[124]5.8753,[125]5.8695,[126]5.9059,[127]5.9397,[128]5.9679,[129]5.9551,[130]5.9628,[131]5.9577,[132]5.9536,[133]5.9413,[134]5.9490,[135]5.9496,[136]5.9397,[137]5.9349,[138]5.9208,[139]5.9126,[140]5.9112,[141]5.8840,[142]5.8817,[143]5.8558,[144]5.8403,[145]5.8328,[146]5.8204,[147]5.8257,[148]5.8279,[149]5.8249,[150]5.8232,[151]5.8275,[152]5.8208,[153]5.8108,[154]5.8054,[155]5.8122,[156]5.8106,[157]5.8269,[158]5.8289,[159]5.8305,[160]5.8342,[161]5.8453,[162]5.8188,[163]5.8082,[164]5.7861,[165]5.7595,[166]5.7354,[167]5.7028,[168]5.6739,[169]5.6606,[170]5.6517,[171]5.6296,[172]5.6167,[173]5.6035,[174]5.5753,[175]5.5553,[176]5.5420,[177]5.5254,[178]5.5038,[179]5.4904,[180]5.4827,[181]5.4657,[182]5.4485,[183]5.4360,[184]5.4348,[185]5.4271,[186]5.4282,[187]5.4332,[188]5.4297,[189]5.4473,[190]5.4471,[191]5.4648,[192]5.4785,[193]5.4948,[194]5.5065,[195]5.5265,[196]5.5385,[197]5.5577,[198]5.5713,[199]5.5728,[200]5.5747,[201]5.5688,[202]5.5830,[203]5.5901,[204]5.5849,[205]5.5946,[206]5.5999,[207]5.5954,[208]5.6013,[209]5.6056,[210]5.6114,[211]5.6216,[212]5.6280,[213]5.6369,[214]5.6402,[215]5.6434,[216]5.6548,[217]5.6719,[218]5.6855,[219]5.6862,[220]5.6827,[221]5.6775,[222]5.6773,[223]5.6701,[224]5.6627,[225]5.6593,[226]5.6796,[227]5.6871,[228]5.6945,[229]5.7014,[230]5.6982,[231]5.7139,[232]5.7031,[233]5.6877,[234]5.6733,[235]5.6521,[236]5.6465,[237]5.6370,[238]5.6397,[239]5.6280,[240]5.6186,[241]5.6219,[242]5.6239,[243]5.6226,[244]5.6122,[245]5.6090,[246]5.5987,[247]5.5889,[248]5.5823,[249]5.5786,[250]5.5821,[251]5.5736,[252]5.5693,[253]5.5598,[254]5.5562,[255]5.5464,[256]5.5294,[257]5.5194,[258]5.5122,[259]5.5121,[260]5.5040,[261]5.4993,[262]5.4950,[263]5.4898,[264]5.4684,[265]5.4679,[266]5.4647,[267]5.4583,[268]5.4651,[269]5.4649,[270]5.4662,[271]5.4723,[272]5.4756,[273]5.4771,[274]5.4788,[275]5.4854,[276]5.4918,[277]5.5050,[278]5.5138,[279]5.5224,[280]5.5257,[281]5.5354,[282]5.5409,[283]5.5541,[284]5.5634,[285]5.5703,[286]5.5834,[287]5.5802,[288]5.5859,[289]5.5799,[290]5.5657,[291]5.5528,[292]5.5389,[293]5.5267,[294]5.5278,[295]5.5279,[296]5.5330,[297]5.5319,[298]5.5333,[299]5.5313,[300]5.5220,[301]5.5223,[302]5.5154,[303]5.5067,[304]5.4995,[305]5.4974,[306]5.4861,[307]5.4896,[308]5.4904,[309]5.4765,[310]5.4726,[311]5.4684,[312]5.4703,[313]5.4646,[314]5.4629,[315]5.4493,[316]5.4465,[317]5.4335,[318]5.4164,[319]5.4272,[320]5.4385,[321]5.4432,[322]5.4399,[323]5.4338,[324]5.4321,[325]5.4421,[326]5.4438,[327]5.4446,[328]5.4478,[329]5.4528,[330]5.4549,[331]5.4651,[332]5.4612,[333]5.4687,[334]5.4637,[335]5.4582,[336]5.4602,[337]5.4589,[338]5.4584,[339]5.4544,[340]5.4515,[341]5.4582,[342]5.4613,[343]5.4661,[344]5.4667,[345]5.4682,[346]5.4668,[347]5.4704,[348]5.4741,[349]5.4761,[350]5.4743,[351]5.4755,[352]5.4756,[353]5.4707,[354]5.4717,[355]5.4764,[356]5.4792,[357]5.4759,[358]5.4840,[359]5.4864,[360]5.4827,[361]5.4826,[362]5.4892,[363]5.5000,[364]5.5057,[365]5.5100,[366]5.5115,[367]5.5204,[368]5.5175,[369]5.5187,[370]5.5204,[371]5.5161,[372]5.5205,[373]5.5250,[374]5.5229,[375]5.5222,[376]5.5284,[377]5.5247,[378]5.5269,[379]5.5311,[380]5.5240,[381]5.5207,[382]5.5167,[383]5.5148,[384]5.5146,[385]5.5136,[386]5.5127,[387]5.5122,[388]5.5088,[389]5.5049,[390]5.4993,[391]5.4931,[392]5.4895,[393]5.4891,[394]5.4920,[395]5.4913,[396]5.4857,[397]5.4919,[398]5.4961,[399]5.5033,[400]5.5021,[401]5.5026,[402]5.5038,[403]5.5060,[404]5.5116,[405]5.4965,[406]5.4924,[407]5.4922,[408]5.4934,[409]5.5046,[410]5.5136,[411]5.5238,[412]5.5381,[413]5.5484,[414]5.5549,[415]5.5611,[416]5.5685,[417]5.5786,[418]5.5810,[419]5.5860,[420]5.5939,[421]5.6041,[422]5.6074,[423]5.6130,[424]5.6227,[425]5.6304,[426]5.6365,[427]5.6407,[428]5.6480,[429]5.6515,[430]5.6580,[431]5.6710,[432]5.6739,[433]5.6728,[434]5.6689,[435]5.6702,[436]5.6729,[437]5.6814,[438]5.6891,[439]5.6861,[440]5.6852,[441]5.6803,[442]5.6787,[443]5.6799,[444]5.6813,[445]5.6802,[446]5.6823,[447]5.6846,[448]5.6880,[449]5.6865,[450]5.6875,[451]5.6846,[452]5.6697,[453]5.6600,[454]5.6546,[455]5.6552,[456]5.6593,[457]5.6607,[458]5.6587,[459]5.6584,[460]5.6658,[461]5.6619,[462]5.6586,[463]5.6573,[464]5.6571,[465]5.6549,[466]5.6477,[467]5.6467,[468]5.6446,[469]5.6459,[470]5.6450,[471]5.6401,[472]5.6416,[473]5.6368,[474]5.6356,[475]5.6290,[476]5.6277,[477]5.6199,[478]5.6175,[479]5.6193,[480]5.6221,[481]5.6226,[482]5.6179,[483]5.6137,[484]5.6148,[485]5.6092,[486]5.6025,[487]5.6015,[488]5.5988,[489]5.5934,[490]5.5903,[491]5.5869,[492]5.5803,[493]5.5774,[494]5.5757,[495]5.5735,[496]5.5693,[497]5.5634,[498]5.5607,[499]5.5572,[500]5.5488,[501]5.5419,[502]5.5411,[503]5.5401,[504]5.5326,[505]5.5329,[506]5.5339,[507]5.5286,[508]5.5248,[509]5.5250,[510]5.5271,[511]5.5315,[512]5.5355,[513]5.5382,[514]5.5438,[515]5.5399,[516]5.5387,[517]5.5387,[518]5.5384,[519]5.5406,[520]5.5419,[521]5.5431,[522]5.5447,[523]5.5454,[524]5.5508,[525]5.5537,[526]5.5542,[527]5.5559,[528]5.5503,[529]5.5514,[530]5.5475,[531]5.5468,[532]5.5516,[533]5.5544,[534]5.5528,[535]5.5551,[536]5.5504,[537]5.5484,[538]5.5534,[539]5.5542,[540]5.5561,[541]5.5560,[542]5.5574,[543]5.5592,[544]5.5606,[545]5.5593,[546]5.5596,[547]5.5561,[548]5.5520,[549]5.5521,[550]5.5498,[551]5.5469,[552]5.5450,[553]5.5416,[554]5.5393,[555]5.5371,[556]5.5361,[557]5.5376,[558]5.5341,[559]5.5346,[560]5.5337,[561]5.5339,[562]5.5311,[563]5.5311,[564]5.5352,[565]5.5365,[566]5.5368,[567]5.5350,[568]5.5358,[569]5.5341,[570]5.5368,[571]5.5380,[572]5.5386,[573]5.5391,[574]5.5360,[575]5.5347,[576]5.5345,[577]5.5326,[578]5.5307,[579]5.5309,[580]5.5253,[581]5.5223,[582]5.5225,[583]5.5233,[584]5.5234,[585]5.5177,[586]5.5119,[587]5.5122,[588]5.5165,[589]5.5217,[590]5.5246,[591]5.5262,[592]5.5250,[593]5.5212,[594]5.5226,[595]5.5208,[596]5.5247,[597]5.5228,[598]5.5199,[599]5.5227,[600]5.5216,[601]5.5204,[602]5.5213,[603]5.5241,[604]5.5249,[605]5.5277,[606]5.5292,[607]5.5277,[608]5.5247,[609]5.5255,[610]5.5296,[611]5.5285,[612]5.5303,[613]5.5274,[614]5.5234,[615]5.5171,[616]5.5197,[617]5.5144,[618]5.5095,[619]5.5049,[620]5.4935,[621]5.4880,[622]5.4859,[623]5.4872,[624]5.4875,[625]5.4881,[626]5.4876,[627]5.4904,[628]5.4910,[629]5.4916,[630]5.4944,[631]5.4990,[632]5.5038,[633]5.5026,[634]5.5056,[635]5.5053,[636]5.5018,[637]5.4981,[638]5.5003,[639]5.4969,[640]5.4974,[641]5.4977,[642]5.5030,[643]5.5049,[644]5.5073,[645]5.5057,[646]5.5094,[647]5.5046,[648]5.5059,[649]5.5062,[650]5.5095,[651]5.5135,[652]5.5138,[653]5.5176,[654]5.5119,[655]5.5110,

llama_print_timings: load time = 5957.47 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1701223.56 ms / 335360 tokens ( 5.07 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1733067.74 ms

Q4_4, 13B

main: seed = 1682790225
llama.cpp: loading model from ../models/13B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 10213.81 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.72 seconds per pass - ETA 29 minutes
[1]3.7371,[2]4.2097,[3]5.0064,[4]5.3764,[5]5.5586,[6]5.4949,[7]5.6305,[8]5.7337,[9]5.9985,[10]6.2164,[11]6.4018,[12]6.4542,[13]6.4223,[14]6.5122,[15]6.7109,[16]6.3997,[17]6.3208,[18]6.2951,[19]6.0050,[20]5.9874,[21]5.9108,[22]5.7379,[23]5.7112,[24]5.6188,[25]5.6312,[26]5.4855,[27]5.3108,[28]5.2153,[29]5.1408,[30]5.0041,[31]4.9644,[32]4.9804,[33]4.9376,[34]4.9779,[35]4.9952,[36]5.0181,[37]5.0117,[38]5.0095,[39]5.0362,[40]5.0770,[41]5.1000,[42]5.1361,[43]5.0994,[44]5.1415,[45]5.1435,[46]5.1173,[47]5.1457,[48]5.1302,[49]5.1325,[50]5.1018,[51]5.1096,[52]5.1021,[53]5.1478,[54]5.1379,[55]5.1189,[56]5.1389,[57]5.1573,[58]5.1792,[59]5.1969,[60]5.2336,[61]5.2273,[62]5.2826,[63]5.3079,[64]5.3183,[65]5.3554,[66]5.3534,[67]5.3714,[68]5.3824,[69]5.4101,[70]5.4406,[71]5.4614,[72]5.4958,[73]5.5438,[74]5.5508,[75]5.5605,[76]5.5748,[77]5.5862,[78]5.5737,[79]5.5997,[80]5.5943,[81]5.6018,[82]5.5978,[83]5.5517,[84]5.5406,[85]5.5338,[86]5.5187,[87]5.4533,[88]5.4109,[89]5.3896,[90]5.3801,[91]5.4009,[92]5.3968,[93]5.3979,[94]5.3971,[95]5.4232,[96]5.4202,[97]5.4171,[98]5.4136,[99]5.4063,[100]5.4035,[101]5.4261,[102]5.4222,[103]5.4380,[104]5.4423,[105]5.4440,[106]5.4578,[107]5.4561,[108]5.4715,[109]5.4708,[110]5.4650,[111]5.4836,[112]5.4995,[113]5.4999,[114]5.4986,[115]5.5033,[116]5.4916,[117]5.4909,[118]5.5141,[119]5.5324,[120]5.5620,[121]5.5776,[122]5.5990,[123]5.6352,[124]5.6525,[125]5.6471,[126]5.6825,[127]5.7151,[128]5.7425,[129]5.7312,[130]5.7397,[131]5.7352,[132]5.7318,[133]5.7202,[134]5.7284,[135]5.7279,[136]5.7190,[137]5.7155,[138]5.7017,[139]5.6938,[140]5.6922,[141]5.6652,[142]5.6613,[143]5.6361,[144]5.6211,[145]5.6124,[146]5.6018,[147]5.6069,[148]5.6100,[149]5.6069,[150]5.6060,[151]5.6105,[152]5.6049,[153]5.5951,[154]5.5893,[155]5.5957,[156]5.5933,[157]5.6083,[158]5.6106,[159]5.6112,[160]5.6147,[161]5.6256,[162]5.6003,[163]5.5907,[164]5.5704,[165]5.5452,[166]5.5225,[167]5.4905,[168]5.4638,[169]5.4505,[170]5.4419,[171]5.4213,[172]5.4093,[173]5.3965,[174]5.3701,[175]5.3500,[176]5.3366,[177]5.3201,[178]5.3003,[179]5.2871,[180]5.2796,[181]5.2638,[182]5.2477,[183]5.2357,[184]5.2347,[185]5.2275,[186]5.2280,[187]5.2337,[188]5.2312,[189]5.2476,[190]5.2479,[191]5.2649,[192]5.2784,[193]5.2932,[194]5.3039,[195]5.3229,[196]5.3343,[197]5.3533,[198]5.3668,[199]5.3688,[200]5.3693,[201]5.3628,[202]5.3753,[203]5.3811,[204]5.3766,[205]5.3852,[206]5.3904,[207]5.3867,[208]5.3925,[209]5.3957,[210]5.4014,[211]5.4117,[212]5.4178,[213]5.4265,[214]5.4290,[215]5.4325,[216]5.4442,[217]5.4606,[218]5.4739,[219]5.4737,[220]5.4708,[221]5.4662,[222]5.4661,[223]5.4595,[224]5.4525,[225]5.4492,[226]5.4689,[227]5.4743,[228]5.4817,[229]5.4885,[230]5.4849,[231]5.5003,[232]5.4900,[233]5.4753,[234]5.4608,[235]5.4390,[236]5.4339,[237]5.4251,[238]5.4284,[239]5.4171,[240]5.4082,[241]5.4115,[242]5.4126,[243]5.4118,[244]5.4020,[245]5.3983,[246]5.3883,[247]5.3786,[248]5.3726,[249]5.3694,[250]5.3727,[251]5.3645,[252]5.3597,[253]5.3509,[254]5.3468,[255]5.3376,[256]5.3213,[257]5.3114,[258]5.3047,[259]5.3035,[260]5.2952,[261]5.2901,[262]5.2863,[263]5.2813,[264]5.2584,[265]5.2582,[266]5.2553,[267]5.2490,[268]5.2555,[269]5.2550,[270]5.2558,[271]5.2620,[272]5.2649,[273]5.2665,[274]5.2676,[275]5.2736,[276]5.2795,[277]5.2917,[278]5.3000,[279]5.3081,[280]5.3118,[281]5.3217,[282]5.3270,[283]5.3396,[284]5.3483,[285]5.3564,[286]5.3687,[287]5.3654,[288]5.3710,[289]5.3649,[290]5.3511,[291]5.3385,[292]5.3253,[293]5.3133,[294]5.3137,[295]5.3138,[296]5.3186,[297]5.3176,[298]5.3198,[299]5.3175,[300]5.3089,[301]5.3093,[302]5.3030,[303]5.2950,[304]5.2876,[305]5.2848,[306]5.2740,[307]5.2769,[308]5.2776,[309]5.2648,[310]5.2621,[311]5.2579,[312]5.2592,[313]5.2536,[314]5.2522,[315]5.2396,[316]5.2361,[317]5.2236,[318]5.2077,[319]5.2178,[320]5.2289,[321]5.2333,[322]5.2301,[323]5.2244,[324]5.2226,[325]5.2319,[326]5.2336,[327]5.2343,[328]5.2379,[329]5.2426,[330]5.2446,[331]5.2548,[332]5.2511,[333]5.2589,[334]5.2544,[335]5.2495,[336]5.2518,[337]5.2507,[338]5.2503,[339]5.2460,[340]5.2434,[341]5.2501,[342]5.2534,[343]5.2578,[344]5.2583,[345]5.2597,[346]5.2581,[347]5.2620,[348]5.2656,[349]5.2677,[350]5.2659,[351]5.2674,[352]5.2677,[353]5.2628,[354]5.2634,[355]5.2684,[356]5.2713,[357]5.2683,[358]5.2761,[359]5.2783,[360]5.2749,[361]5.2747,[362]5.2813,[363]5.2920,[364]5.2969,[365]5.3006,[366]5.3024,[367]5.3111,[368]5.3090,[369]5.3105,[370]5.3126,[371]5.3086,[372]5.3133,[373]5.3173,[374]5.3155,[375]5.3151,[376]5.3206,[377]5.3171,[378]5.3197,[379]5.3236,[380]5.3168,[381]5.3140,[382]5.3098,[383]5.3080,[384]5.3080,[385]5.3068,[386]5.3057,[387]5.3054,[388]5.3025,[389]5.2988,[390]5.2936,[391]5.2880,[392]5.2844,[393]5.2841,[394]5.2872,[395]5.2864,[396]5.2812,[397]5.2879,[398]5.2922,[399]5.2994,[400]5.2985,[401]5.2992,[402]5.3002,[403]5.3026,[404]5.3081,[405]5.2931,[406]5.2890,[407]5.2878,[408]5.2889,[409]5.3000,[410]5.3092,[411]5.3186,[412]5.3326,[413]5.3428,[414]5.3491,[415]5.3550,[416]5.3624,[417]5.3719,[418]5.3740,[419]5.3788,[420]5.3865,[421]5.3960,[422]5.3994,[423]5.4049,[424]5.4137,[425]5.4214,[426]5.4276,[427]5.4316,[428]5.4389,[429]5.4427,[430]5.4488,[431]5.4614,[432]5.4646,[433]5.4638,[434]5.4604,[435]5.4616,[436]5.4644,[437]5.4727,[438]5.4801,[439]5.4774,[440]5.4765,[441]5.4720,[442]5.4709,[443]5.4721,[444]5.4739,[445]5.4731,[446]5.4750,[447]5.4774,[448]5.4805,[449]5.4789,[450]5.4800,[451]5.4771,[452]5.4617,[453]5.4525,[454]5.4469,[455]5.4473,[456]5.4512,[457]5.4524,[458]5.4506,[459]5.4501,[460]5.4573,[461]5.4533,[462]5.4495,[463]5.4477,[464]5.4473,[465]5.4452,[466]5.4377,[467]5.4366,[468]5.4347,[469]5.4357,[470]5.4346,[471]5.4296,[472]5.4303,[473]5.4257,[474]5.4247,[475]5.4179,[476]5.4152,[477]5.4071,[478]5.4045,[479]5.4049,[480]5.4075,[481]5.4078,[482]5.4032,[483]5.3992,[484]5.4000,[485]5.3932,[486]5.3868,[487]5.3860,[488]5.3837,[489]5.3786,[490]5.3753,[491]5.3719,[492]5.3651,[493]5.3621,[494]5.3605,[495]5.3584,[496]5.3546,[497]5.3485,[498]5.3458,[499]5.3424,[500]5.3344,[501]5.3273,[502]5.3263,[503]5.3252,[504]5.3175,[505]5.3174,[506]5.3180,[507]5.3127,[508]5.3091,[509]5.3096,[510]5.3118,[511]5.3160,[512]5.3199,[513]5.3222,[514]5.3277,[515]5.3237,[516]5.3227,[517]5.3225,[518]5.3226,[519]5.3247,[520]5.3260,[521]5.3270,[522]5.3283,[523]5.3290,[524]5.3345,[525]5.3372,[526]5.3377,[527]5.3393,[528]5.3339,[529]5.3348,[530]5.3311,[531]5.3306,[532]5.3354,[533]5.3382,[534]5.3363,[535]5.3384,[536]5.3341,[537]5.3323,[538]5.3373,[539]5.3381,[540]5.3398,[541]5.3396,[542]5.3409,[543]5.3431,[544]5.3444,[545]5.3433,[546]5.3435,[547]5.3403,[548]5.3362,[549]5.3363,[550]5.3343,[551]5.3318,[552]5.3299,[553]5.3270,[554]5.3247,[555]5.3228,[556]5.3221,[557]5.3237,[558]5.3205,[559]5.3207,[560]5.3194,[561]5.3195,[562]5.3168,[563]5.3166,[564]5.3209,[565]5.3219,[566]5.3225,[567]5.3206,[568]5.3216,[569]5.3201,[570]5.3228,[571]5.3241,[572]5.3251,[573]5.3254,[574]5.3226,[575]5.3207,[576]5.3201,[577]5.3185,[578]5.3166,[579]5.3164,[580]5.3112,[581]5.3082,[582]5.3083,[583]5.3092,[584]5.3098,[585]5.3040,[586]5.2987,[587]5.2987,[588]5.3031,[589]5.3080,[590]5.3109,[591]5.3125,[592]5.3114,[593]5.3075,[594]5.3089,[595]5.3073,[596]5.3114,[597]5.3095,[598]5.3062,[599]5.3088,[600]5.3079,[601]5.3068,[602]5.3067,[603]5.3094,[604]5.3099,[605]5.3124,[606]5.3137,[607]5.3123,[608]5.3095,[609]5.3104,[610]5.3144,[611]5.3130,[612]5.3151,[613]5.3122,[614]5.3083,[615]5.3025,[616]5.3051,[617]5.3002,[618]5.2960,[619]5.2916,[620]5.2808,[621]5.2758,[622]5.2741,[623]5.2754,[624]5.2758,[625]5.2766,[626]5.2763,[627]5.2790,[628]5.2798,[629]5.2802,[630]5.2832,[631]5.2876,[632]5.2923,[633]5.2911,[634]5.2940,[635]5.2936,[636]5.2901,[637]5.2864,[638]5.2885,[639]5.2854,[640]5.2859,[641]5.2863,[642]5.2913,[643]5.2930,[644]5.2947,[645]5.2933,[646]5.2967,[647]5.2916,[648]5.2927,[649]5.2929,[650]5.2959,[651]5.3000,[652]5.3004,[653]5.3042,[654]5.2988,[655]5.2981,

llama_print_timings: load time = 6350.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1667989.72 ms / 335360 tokens ( 4.97 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1699934.63 ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    Less than 4 bitsEfforts related to viable quantized models using <4 bitsenhancementNew feature or requestgeneration qualityQuality of model output

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions