Description
Summary
Use 16 x 8
"super-blocks" for quantization, having one fp16
scale for the "super-block" and 16 quantized scales per 8 model weights. This is particularly useful for 2- and 3-bit quantization, but it also outperforms the existing 4-bit quantization schemes Q4_0
and Q4_2
.
Details
The naming of existing llama.cpp
quantizations follows the scheme QX_Y
, where X
is the number of bits used for the quants, and Y
is 0, 1, 2,
or 3
. When Y
is even (0 or 2), model weights x
are computed from the quants q
as x = d * q
. When Y
is odd, then x = m + d * q
is used. If we look at the integer part of Y/2
([Y/2]
), then the number of weights in a quantization block is 32 (Q4_0
, Q4_1
, Q5_0
) when [Y/2] = 0
, and 16 (Q4_2
, Q4_3
) when [Y/2] = 1
. From the latest perplexity results one can see that quantization using blocks of 16 weights performs better than quantization that uses blocks of 32. The logical conclusion from this would be to look into using blocks of 8 weights. Following the existing naming convention, quantization of type x = d * q
for blocks with 8 weights would be QX_4
, and quantization of type x = m + d * q
would be QX_5
. The problem with going to blocks with 8 weights using the same strategy as utilized in Q4_2
and Q4_3
is that the bits needed to store the scale d
(or scale d
and offset m
) start becoming comparable to the number of bits used for the quants q
. For instance, using fp16
for the scale in a block of 8 weights requires 16 bits, while the quants need 32 bits for 4-bit quantization, so effectively 6 bits per weight (bpw).
So, after this long introduction, here is an idea how one can use quantization blocks of 8 weights while keeping bpw reasonable: one can use "super-blocks" that combine N
quantization blocks. The scale in each block of 8 weights is stored as int8_t
, and there is a single fp16
that converts the quantized scales to their final value. E.g., for 4-bit quantization
#define QK4_4 128
typedef struct {
int8_t scales[QK4_4/8]; // quantized scales per 8 weights
uint8_t qs[QK4_4/2]; // nibbles / quants of the "super-block"
ggml_fp16_t d; // "super-block" scale
} block_q4_4;
In the above, N = 16
, i.e., there are 16
blocks of 8 weights, each having its own 8-bit quantized scale. This ends up using 5.125
bpw (4 + 1.125
).
To further clarify the idea, here is a simple scalar implementation of the de-quantization for Q4_4
:
static void dequantize_row_q4_4(const void * restrict vx, float * restrict y, int k) {
assert(k % QK4_4 == 0);
const int nb = k / QK4_4;
const block_q4_4 * restrict x = vx;
uint32_t u;
for (int i = 0; i < nb; i++) {
const float d_all = GGML_FP16_TO_FP32(x[i].d);
const uint8_t * q = x[i].qs;
for (int n = 0; n < QK4_4/8; ++n) {
memcpy(&u, q, 4);
const uint32_t u1 = (u >> 0) & 0x0f0f0f0f;
const uint32_t u2 = (u >> 4) & 0x0f0f0f0f;
const int8_t * v1 = (const int8_t*)&u1;
const int8_t * v2 = (const int8_t*)&u2;
float d = d_all * x[i].scales[n];
y[0] = d * (v1[0] - 8);
y[1] = d * (v2[0] - 8);
y[2] = d * (v1[1] - 8);
y[3] = d * (v2[1] - 8);
y[4] = d * (v1[2] - 8);
y[5] = d * (v2[2] - 8);
y[6] = d * (v1[3] - 8);
y[7] = d * (v2[3] - 8);
q += 4;
y += 8;
}
}
}
Perplexity results
I have done some experiments with this idea for 2-, 3- and 4-bit quantization and the following table summarizes the perplexity results. All calculations are with output tensor kept as fp16
, which adds about 200 MB to the size of the quantized model (compared to the output.weight
tensor also being quantized):
Model | Measure | Q2_4 | Q3_4 | Q4_4 |
---|---|---|---|---|
7B | perplexity | 8.3618 | 6.3559 | 6.1378 |
7B | file size | 2.65G | 3.45G | 4.2G |
13B | perplexity | 6.7409 | 5.5110 | 5.2981 |
13B | file size | 4.95G | 6.45G | 8.0G |
A few observations from the experiments and existing 4- and 5-bit results
- At 4 and 5 bits, quantization of type
x = m + d * q
(QX_1
,QX_3
) performs better thanx = d * m
(QX_0
,QX_2
, and theQX_4
proposed here). This trend is reversed for 2- and 3-bit quantization. Especially for 2-bit quantization,Q2_1
andQ2_3
give basically useless results - There has been some work done for 2- and 3-bit quantization on this branch. The
Q2_4
quantization proposed here gives much lower perplexity compared to what is reported there forQ2_2
(and my own experiment withQ2_2
gives a 7B perplexity of10.6271
and 13B perplexity of8.3552
. The 30BQ2_2
perplexity of6.9507
reported there is higher than the 13BQ2_4
perplexity found here ). - At 2-bit quantization, the difference between quantized and not quantized output tensor is significant (e.g., quantized output results in a 7B perplexity of
9.0087
vs8.3618
from the above table. At 3-bit quantization the difference is much smaller (e.g.,6.4433
vs6.3559
for 7B). Q4_4
is better thanQ4_0
andQ4_2
, but the difference is much less compared to 2- and 3-bit quantizations- I have tried
N = 8, 16, 32
(so "super-blocks" of64, 128, 256
weights). Perplexity results remain effectively the same, while extra bits per weight (extra as in addition to theX
quantization bits) change from1.25
to1.125
to1.0625
. Tensor sizes are divisible by 256 for all layers in the 7B and 13B models, so one could use this instead of the super-block size of 128 used here (this saves ~0.1G for the 13B model).
Here are the perplexity runs reported above:
Q2_4, 7B
main: seed = 1682671488
llama.cpp: loading model from ../models/7B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 4504.40 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.49 seconds per pass - ETA 16 minutes
[1]6.2670,[2]7.2397,[3]7.8484,[4]8.7113,[5]8.5541,[6]8.5304,[7]8.6772,[8]8.7266,[9]9.2108,[10]9.5711,[11]9.9331,[12]10.0286,[13]10.0075,[14]10.2435,[15]10.5844,[16]10.0359,[17]9.8055,[18]9.8147,[19]9.2709,[20]9.2379,[21]9.0826,[22]8.9060,[23]8.8751,[24]8.7826,[25]8.7974,[26]8.5861,[27]8.3354,[28]8.2662,[29]8.1377,[30]7.9415,[31]7.9138,[32]7.9309,[33]7.8544,[34]7.9014,[35]7.9400,[36]8.0232,[37]8.0318,[38]8.0524,[39]8.1139,[40]8.1847,[41]8.2215,[42]8.2709,[43]8.1977,[44]8.2695,[45]8.2579,[46]8.2205,[47]8.2469,[48]8.1920,[49]8.1907,[50]8.1222,[51]8.1247,[52]8.1002,[53]8.1535,[54]8.1312,[55]8.0808,[56]8.1320,[57]8.1640,[58]8.1951,[59]8.2061,[60]8.2674,[61]8.2525,[62]8.3396,[63]8.3781,[64]8.3872,[65]8.4551,[66]8.4626,[67]8.4924,[68]8.5125,[69]8.5512,[70]8.5979,[71]8.6292,[72]8.6747,[73]8.7532,[74]8.7463,[75]8.7566,[76]8.7672,[77]8.7906,[78]8.7662,[79]8.7960,[80]8.7820,[81]8.8039,[82]8.8159,[83]8.7322,[84]8.7181,[85]8.7079,[86]8.6686,[87]8.6050,[88]8.5635,[89]8.5359,[90]8.5171,[91]8.5563,[92]8.5568,[93]8.5615,[94]8.5647,[95]8.6043,[96]8.6056,[97]8.6059,[98]8.5980,[99]8.5716,[100]8.5662,[101]8.5971,[102]8.5883,[103]8.6169,[104]8.6284,[105]8.6296,[106]8.6542,[107]8.6555,[108]8.6706,[109]8.6619,[110]8.6569,[111]8.6789,[112]8.7064,[113]8.7181,[114]8.7197,[115]8.7335,[116]8.7271,[117]8.7349,[118]8.7694,[119]8.7968,[120]8.8443,[121]8.8727,[122]8.9000,[123]8.9478,[124]8.9703,[125]8.9524,[126]9.0055,[127]9.0459,[128]9.0799,[129]9.0528,[130]9.0613,[131]9.0543,[132]9.0445,[133]9.0327,[134]9.0539,[135]9.0482,[136]9.0383,[137]9.0256,[138]9.0123,[139]9.0012,[140]9.0007,[141]8.9808,[142]8.9746,[143]8.9590,[144]8.9421,[145]8.9386,[146]8.9224,[147]8.9316,[148]8.9294,[149]8.9273,[150]8.9260,[151]8.9272,[152]8.9072,[153]8.8795,[154]8.8652,[155]8.8713,[156]8.8626,[157]8.8816,[158]8.8812,[159]8.8950,[160]8.8969,[161]8.9131,[162]8.8704,[163]8.8516,[164]8.8103,[165]8.7617,[166]8.7196,[167]8.6616,[168]8.6151,[169]8.5920,[170]8.5716,[171]8.5289,[172]8.4991,[173]8.4753,[174]8.4339,[175]8.4022,[176]8.3800,[177]8.3518,[178]8.3213,[179]8.2960,[180]8.2789,[181]8.2471,[182]8.2158,[183]8.1923,[184]8.1901,[185]8.1759,[186]8.1755,[187]8.1800,[188]8.1783,[189]8.2022,[190]8.2040,[191]8.2313,[192]8.2490,[193]8.2748,[194]8.2906,[195]8.3192,[196]8.3382,[197]8.3638,[198]8.3830,[199]8.3825,[200]8.3868,[201]8.3827,[202]8.4192,[203]8.4279,[204]8.4391,[205]8.4521,[206]8.4600,[207]8.4536,[208]8.4630,[209]8.4689,[210]8.4726,[211]8.4873,[212]8.4963,[213]8.5092,[214]8.5160,[215]8.5206,[216]8.5372,[217]8.5580,[218]8.5740,[219]8.5719,[220]8.5640,[221]8.5553,[222]8.5496,[223]8.5326,[224]8.5205,[225]8.5155,[226]8.5383,[227]8.5536,[228]8.5615,[229]8.5653,[230]8.5606,[231]8.5816,[232]8.5697,[233]8.5431,[234]8.5208,[235]8.5105,[236]8.4995,[237]8.4836,[238]8.4880,[239]8.4663,[240]8.4513,[241]8.4579,[242]8.4639,[243]8.4602,[244]8.4461,[245]8.4448,[246]8.4290,[247]8.4139,[248]8.4028,[249]8.3996,[250]8.4048,[251]8.3959,[252]8.3912,[253]8.3790,[254]8.3743,[255]8.3575,[256]8.3323,[257]8.3143,[258]8.3039,[259]8.3025,[260]8.2925,[261]8.2885,[262]8.2801,[263]8.2747,[264]8.2566,[265]8.2547,[266]8.2506,[267]8.2400,[268]8.2504,[269]8.2489,[270]8.2463,[271]8.2539,[272]8.2614,[273]8.2570,[274]8.2603,[275]8.2742,[276]8.2809,[277]8.3035,[278]8.3175,[279]8.3288,[280]8.3324,[281]8.3439,[282]8.3494,[283]8.3672,[284]8.3763,[285]8.3869,[286]8.4031,[287]8.4049,[288]8.4160,[289]8.4034,[290]8.3824,[291]8.3640,[292]8.3433,[293]8.3263,[294]8.3285,[295]8.3263,[296]8.3322,[297]8.3320,[298]8.3405,[299]8.3354,[300]8.3234,[301]8.3182,[302]8.3099,[303]8.2992,[304]8.2859,[305]8.2847,[306]8.2684,[307]8.2696,[308]8.2736,[309]8.2506,[310]8.2432,[311]8.2368,[312]8.2386,[313]8.2294,[314]8.2276,[315]8.2045,[316]8.2053,[317]8.1839,[318]8.1579,[319]8.1772,[320]8.1937,[321]8.2004,[322]8.1927,[323]8.1903,[324]8.1923,[325]8.2089,[326]8.2077,[327]8.2130,[328]8.2180,[329]8.2283,[330]8.2368,[331]8.2554,[332]8.2503,[333]8.2620,[334]8.2542,[335]8.2433,[336]8.2464,[337]8.2399,[338]8.2423,[339]8.2345,[340]8.2289,[341]8.2386,[342]8.2404,[343]8.2479,[344]8.2476,[345]8.2446,[346]8.2377,[347]8.2412,[348]8.2458,[349]8.2465,[350]8.2403,[351]8.2399,[352]8.2413,[353]8.2316,[354]8.2341,[355]8.2429,[356]8.2474,[357]8.2407,[358]8.2533,[359]8.2570,[360]8.2481,[361]8.2455,[362]8.2539,[363]8.2650,[364]8.2735,[365]8.2813,[366]8.2828,[367]8.2931,[368]8.2887,[369]8.2886,[370]8.2892,[371]8.2801,[372]8.2848,[373]8.2920,[374]8.2879,[375]8.2865,[376]8.2957,[377]8.2873,[378]8.2901,[379]8.2977,[380]8.2848,[381]8.2801,[382]8.2740,[383]8.2709,[384]8.2678,[385]8.2664,[386]8.2672,[387]8.2645,[388]8.2579,[389]8.2485,[390]8.2391,[391]8.2277,[392]8.2256,[393]8.2288,[394]8.2330,[395]8.2318,[396]8.2207,[397]8.2295,[398]8.2333,[399]8.2437,[400]8.2453,[401]8.2475,[402]8.2493,[403]8.2497,[404]8.2573,[405]8.2502,[406]8.2458,[407]8.2465,[408]8.2466,[409]8.2624,[410]8.2776,[411]8.2933,[412]8.3162,[413]8.3303,[414]8.3414,[415]8.3478,[416]8.3594,[417]8.3760,[418]8.3834,[419]8.3934,[420]8.4054,[421]8.4215,[422]8.4264,[423]8.4389,[424]8.4543,[425]8.4667,[426]8.4751,[427]8.4789,[428]8.4899,[429]8.4956,[430]8.5069,[431]8.5263,[432]8.5289,[433]8.5255,[434]8.5161,[435]8.5146,[436]8.5163,[437]8.5286,[438]8.5396,[439]8.5341,[440]8.5309,[441]8.5231,[442]8.5200,[443]8.5216,[444]8.5225,[445]8.5190,[446]8.5207,[447]8.5240,[448]8.5283,[449]8.5239,[450]8.5232,[451]8.5163,[452]8.5112,[453]8.5021,[454]8.4972,[455]8.4964,[456]8.5014,[457]8.5044,[458]8.5017,[459]8.5020,[460]8.5125,[461]8.5089,[462]8.5070,[463]8.5138,[464]8.5136,[465]8.5099,[466]8.5021,[467]8.5045,[468]8.5073,[469]8.5106,[470]8.5116,[471]8.5045,[472]8.5100,[473]8.5005,[474]8.5033,[475]8.5011,[476]8.5043,[477]8.4954,[478]8.4967,[479]8.5104,[480]8.5171,[481]8.5200,[482]8.5139,[483]8.5089,[484]8.5131,[485]8.5129,[486]8.5049,[487]8.5066,[488]8.5061,[489]8.4980,[490]8.4952,[491]8.4915,[492]8.4829,[493]8.4796,[494]8.4758,[495]8.4773,[496]8.4722,[497]8.4668,[498]8.4663,[499]8.4568,[500]8.4459,[501]8.4388,[502]8.4402,[503]8.4385,[504]8.4277,[505]8.4303,[506]8.4317,[507]8.4305,[508]8.4251,[509]8.4240,[510]8.4299,[511]8.4356,[512]8.4377,[513]8.4391,[514]8.4473,[515]8.4395,[516]8.4386,[517]8.4401,[518]8.4385,[519]8.4423,[520]8.4454,[521]8.4476,[522]8.4521,[523]8.4524,[524]8.4590,[525]8.4639,[526]8.4657,[527]8.4686,[528]8.4647,[529]8.4674,[530]8.4580,[531]8.4541,[532]8.4608,[533]8.4632,[534]8.4588,[535]8.4638,[536]8.4558,[537]8.4510,[538]8.4575,[539]8.4578,[540]8.4657,[541]8.4691,[542]8.4701,[543]8.4711,[544]8.4726,[545]8.4708,[546]8.4720,[547]8.4648,[548]8.4550,[549]8.4551,[550]8.4508,[551]8.4454,[552]8.4420,[553]8.4362,[554]8.4316,[555]8.4256,[556]8.4259,[557]8.4311,[558]8.4275,[559]8.4277,[560]8.4264,[561]8.4258,[562]8.4236,[563]8.4254,[564]8.4323,[565]8.4359,[566]8.4354,[567]8.4333,[568]8.4318,[569]8.4280,[570]8.4301,[571]8.4303,[572]8.4308,[573]8.4289,[574]8.4256,[575]8.4269,[576]8.4264,[577]8.4243,[578]8.4222,[579]8.4236,[580]8.4137,[581]8.4081,[582]8.4046,[583]8.4039,[584]8.4026,[585]8.3949,[586]8.3877,[587]8.3879,[588]8.3944,[589]8.4027,[590]8.4066,[591]8.4067,[592]8.4038,[593]8.3969,[594]8.3972,[595]8.3932,[596]8.3984,[597]8.3938,[598]8.3913,[599]8.3928,[600]8.3922,[601]8.3893,[602]8.3954,[603]8.3990,[604]8.4015,[605]8.4037,[606]8.4054,[607]8.4049,[608]8.3980,[609]8.3974,[610]8.4014,[611]8.3989,[612]8.4027,[613]8.3976,[614]8.3919,[615]8.3800,[616]8.3859,[617]8.3765,[618]8.3683,[619]8.3586,[620]8.3366,[621]8.3254,[622]8.3233,[623]8.3250,[624]8.3239,[625]8.3229,[626]8.3212,[627]8.3259,[628]8.3252,[629]8.3237,[630]8.3274,[631]8.3340,[632]8.3404,[633]8.3379,[634]8.3423,[635]8.3431,[636]8.3403,[637]8.3381,[638]8.3427,[639]8.3394,[640]8.3394,[641]8.3390,[642]8.3474,[643]8.3494,[644]8.3498,[645]8.3465,[646]8.3537,[647]8.3518,[648]8.3533,[649]8.3524,[650]8.3579,[651]8.3656,[652]8.3674,[653]8.3719,[654]8.3636,[655]8.3618,
llama_print_timings: load time = 2570.97 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 906622.46 ms / 335360 tokens ( 2.70 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 933921.98 ms
Q3_4, 7B
main: seed = 1682612164
llama.cpp: loading model from junk.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 5390.48 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
13.33 seconds per pass - ETA 2 hours 25 minutes
[1]4.5663,[2]4.9408,[3]5.8361,[4]6.5915,[5]6.6755,[6]6.6236,[7]6.8095,[8]6.9148,[9]7.2707,[10]7.5192,[11]7.7399,[12]7.7851,[13]7.7344,[14]7.8086,[15]8.0569,[16]7.6602,[17]7.5283,[18]7.4705,[19]7.0917,[20]7.0792,[21]6.9849,[22]6.8041,[23]6.7713,[24]6.6866,[25]6.6918,[26]6.5324,[27]6.3450,[28]6.2465,[29]6.1521,[30]5.9987,[31]5.9758,[32]5.9998,[33]5.9411,[34]5.9724,[35]5.9995,[36]6.0390,[37]6.0437,[38]6.0577,[39]6.0941,[40]6.1552,[41]6.1714,[42]6.2174,[43]6.1721,[44]6.2245,[45]6.2294,[46]6.2025,[47]6.2265,[48]6.1973,[49]6.2013,[50]6.1552,[51]6.1511,[52]6.1385,[53]6.1806,[54]6.1639,[55]6.1354,[56]6.1712,[57]6.1937,[58]6.2181,[59]6.2339,[60]6.2785,[61]6.2698,[62]6.3279,[63]6.3618,[64]6.3756,[65]6.4226,[66]6.4309,[67]6.4517,[68]6.4697,[69]6.4930,[70]6.5255,[71]6.5479,[72]6.5798,[73]6.6429,[74]6.6461,[75]6.6590,[76]6.6751,[77]6.6892,[78]6.6736,[79]6.7003,[80]6.6948,[81]6.7049,[82]6.7107,[83]6.6550,[84]6.6395,[85]6.6251,[86]6.6019,[87]6.5435,[88]6.5150,[89]6.4959,[90]6.4806,[91]6.5057,[92]6.4994,[93]6.4989,[94]6.4935,[95]6.5215,[96]6.5199,[97]6.5145,[98]6.5099,[99]6.4929,[100]6.4931,[101]6.5206,[102]6.5163,[103]6.5351,[104]6.5432,[105]6.5422,[106]6.5577,[107]6.5536,[108]6.5665,[109]6.5615,[110]6.5573,[111]6.5786,[112]6.6020,[113]6.6021,[114]6.5969,[115]6.6038,[116]6.5944,[117]6.6001,[118]6.6283,[119]6.6498,[120]6.6840,[121]6.6992,[122]6.7240,[123]6.7622,[124]6.7804,[125]6.7694,[126]6.8078,[127]6.8453,[128]6.8757,[129]6.8576,[130]6.8657,[131]6.8604,[132]6.8510,[133]6.8362,[134]6.8459,[135]6.8407,[136]6.8283,[137]6.8197,[138]6.8047,[139]6.7936,[140]6.7900,[141]6.7636,[142]6.7598,[143]6.7322,[144]6.7116,[145]6.7051,[146]6.6920,[147]6.6975,[148]6.6982,[149]6.6923,[150]6.6885,[151]6.6909,[152]6.6807,[153]6.6650,[154]6.6550,[155]6.6620,[156]6.6565,[157]6.6743,[158]6.6784,[159]6.6818,[160]6.6830,[161]6.6951,[162]6.6634,[163]6.6521,[164]6.6259,[165]6.5926,[166]6.5632,[167]6.5231,[168]6.4907,[169]6.4767,[170]6.4657,[171]6.4369,[172]6.4188,[173]6.4011,[174]6.3693,[175]6.3458,[176]6.3342,[177]6.3129,[178]6.2889,[179]6.2715,[180]6.2615,[181]6.2389,[182]6.2196,[183]6.2037,[184]6.2019,[185]6.1936,[186]6.1946,[187]6.2012,[188]6.1977,[189]6.2168,[190]6.2182,[191]6.2409,[192]6.2573,[193]6.2741,[194]6.2860,[195]6.3075,[196]6.3233,[197]6.3455,[198]6.3612,[199]6.3641,[200]6.3691,[201]6.3647,[202]6.3864,[203]6.3947,[204]6.3980,[205]6.4091,[206]6.4162,[207]6.4121,[208]6.4209,[209]6.4261,[210]6.4310,[211]6.4415,[212]6.4490,[213]6.4591,[214]6.4633,[215]6.4680,[216]6.4832,[217]6.5014,[218]6.5150,[219]6.5162,[220]6.5118,[221]6.5051,[222]6.5023,[223]6.4909,[224]6.4835,[225]6.4790,[226]6.5006,[227]6.5076,[228]6.5134,[229]6.5182,[230]6.5144,[231]6.5315,[232]6.5184,[233]6.5008,[234]6.4849,[235]6.4693,[236]6.4610,[237]6.4500,[238]6.4531,[239]6.4369,[240]6.4258,[241]6.4290,[242]6.4326,[243]6.4305,[244]6.4181,[245]6.4153,[246]6.4041,[247]6.3923,[248]6.3846,[249]6.3822,[250]6.3861,[251]6.3801,[252]6.3766,[253]6.3666,[254]6.3631,[255]6.3511,[256]6.3323,[257]6.3202,[258]6.3114,[259]6.3098,[260]6.3011,[261]6.2972,[262]6.2914,[263]6.2865,[264]6.2683,[265]6.2681,[266]6.2659,[267]6.2593,[268]6.2693,[269]6.2675,[270]6.2675,[271]6.2760,[272]6.2799,[273]6.2786,[274]6.2805,[275]6.2891,[276]6.2950,[277]6.3109,[278]6.3215,[279]6.3307,[280]6.3332,[281]6.3427,[282]6.3475,[283]6.3624,[284]6.3708,[285]6.3798,[286]6.3937,[287]6.3928,[288]6.3992,[289]6.3897,[290]6.3734,[291]6.3572,[292]6.3419,[293]6.3274,[294]6.3303,[295]6.3298,[296]6.3349,[297]6.3338,[298]6.3372,[299]6.3346,[300]6.3234,[301]6.3228,[302]6.3155,[303]6.3066,[304]6.2975,[305]6.2948,[306]6.2815,[307]6.2830,[308]6.2856,[309]6.2690,[310]6.2629,[311]6.2567,[312]6.2591,[313]6.2540,[314]6.2524,[315]6.2360,[316]6.2325,[317]6.2155,[318]6.1942,[319]6.2070,[320]6.2197,[321]6.2236,[322]6.2183,[323]6.2123,[324]6.2093,[325]6.2197,[326]6.2195,[327]6.2218,[328]6.2254,[329]6.2315,[330]6.2353,[331]6.2479,[332]6.2454,[333]6.2528,[334]6.2471,[335]6.2406,[336]6.2445,[337]6.2410,[338]6.2406,[339]6.2349,[340]6.2300,[341]6.2388,[342]6.2411,[343]6.2469,[344]6.2470,[345]6.2466,[346]6.2438,[347]6.2479,[348]6.2522,[349]6.2543,[350]6.2511,[351]6.2515,[352]6.2519,[353]6.2461,[354]6.2467,[355]6.2520,[356]6.2548,[357]6.2507,[358]6.2604,[359]6.2628,[360]6.2577,[361]6.2571,[362]6.2634,[363]6.2747,[364]6.2805,[365]6.2861,[366]6.2869,[367]6.2959,[368]6.2936,[369]6.2947,[370]6.2959,[371]6.2899,[372]6.2945,[373]6.3000,[374]6.2986,[375]6.2982,[376]6.3058,[377]6.3004,[378]6.3028,[379]6.3093,[380]6.3012,[381]6.2973,[382]6.2922,[383]6.2908,[384]6.2898,[385]6.2888,[386]6.2885,[387]6.2885,[388]6.2839,[389]6.2783,[390]6.2714,[391]6.2634,[392]6.2598,[393]6.2582,[394]6.2612,[395]6.2601,[396]6.2523,[397]6.2594,[398]6.2624,[399]6.2696,[400]6.2688,[401]6.2714,[402]6.2727,[403]6.2746,[404]6.2816,[405]6.2732,[406]6.2692,[407]6.2692,[408]6.2711,[409]6.2829,[410]6.2948,[411]6.3068,[412]6.3233,[413]6.3350,[414]6.3431,[415]6.3488,[416]6.3577,[417]6.3705,[418]6.3745,[419]6.3817,[420]6.3908,[421]6.4034,[422]6.4078,[423]6.4150,[424]6.4265,[425]6.4360,[426]6.4423,[427]6.4469,[428]6.4554,[429]6.4599,[430]6.4688,[431]6.4829,[432]6.4860,[433]6.4848,[434]6.4798,[435]6.4800,[436]6.4816,[437]6.4912,[438]6.4991,[439]6.4958,[440]6.4950,[441]6.4895,[442]6.4880,[443]6.4888,[444]6.4892,[445]6.4875,[446]6.4892,[447]6.4916,[448]6.4960,[449]6.4937,[450]6.4942,[451]6.4897,[452]6.4792,[453]6.4706,[454]6.4647,[455]6.4659,[456]6.4703,[457]6.4721,[458]6.4701,[459]6.4703,[460]6.4789,[461]6.4761,[462]6.4740,[463]6.4784,[464]6.4771,[465]6.4748,[466]6.4669,[467]6.4671,[468]6.4666,[469]6.4686,[470]6.4690,[471]6.4642,[472]6.4688,[473]6.4631,[474]6.4643,[475]6.4586,[476]6.4605,[477]6.4535,[478]6.4529,[479]6.4602,[480]6.4651,[481]6.4669,[482]6.4626,[483]6.4583,[484]6.4605,[485]6.4590,[486]6.4533,[487]6.4536,[488]6.4514,[489]6.4462,[490]6.4439,[491]6.4405,[492]6.4345,[493]6.4318,[494]6.4300,[495]6.4294,[496]6.4262,[497]6.4207,[498]6.4187,[499]6.4142,[500]6.4045,[501]6.3977,[502]6.3982,[503]6.3975,[504]6.3887,[505]6.3914,[506]6.3922,[507]6.3870,[508]6.3831,[509]6.3822,[510]6.3861,[511]6.3908,[512]6.3944,[513]6.3960,[514]6.4024,[515]6.3970,[516]6.3961,[517]6.3972,[518]6.3967,[519]6.3997,[520]6.4024,[521]6.4039,[522]6.4071,[523]6.4077,[524]6.4132,[525]6.4167,[526]6.4175,[527]6.4192,[528]6.4137,[529]6.4147,[530]6.4096,[531]6.4080,[532]6.4132,[533]6.4158,[534]6.4138,[535]6.4163,[536]6.4104,[537]6.4078,[538]6.4132,[539]6.4142,[540]6.4182,[541]6.4187,[542]6.4199,[543]6.4211,[544]6.4221,[545]6.4199,[546]6.4205,[547]6.4159,[548]6.4104,[549]6.4102,[550]6.4072,[551]6.4035,[552]6.4016,[553]6.3975,[554]6.3949,[555]6.3915,[556]6.3912,[557]6.3940,[558]6.3902,[559]6.3897,[560]6.3893,[561]6.3894,[562]6.3868,[563]6.3866,[564]6.3912,[565]6.3934,[566]6.3934,[567]6.3908,[568]6.3910,[569]6.3895,[570]6.3927,[571]6.3928,[572]6.3939,[573]6.3937,[574]6.3899,[575]6.3895,[576]6.3898,[577]6.3881,[578]6.3859,[579]6.3864,[580]6.3795,[581]6.3757,[582]6.3744,[583]6.3751,[584]6.3752,[585]6.3680,[586]6.3610,[587]6.3616,[588]6.3663,[589]6.3722,[590]6.3754,[591]6.3775,[592]6.3755,[593]6.3719,[594]6.3725,[595]6.3698,[596]6.3737,[597]6.3711,[598]6.3683,[599]6.3702,[600]6.3694,[601]6.3680,[602]6.3704,[603]6.3732,[604]6.3745,[605]6.3776,[606]6.3799,[607]6.3788,[608]6.3751,[609]6.3755,[610]6.3790,[611]6.3770,[612]6.3797,[613]6.3758,[614]6.3703,[615]6.3627,[616]6.3654,[617]6.3589,[618]6.3537,[619]6.3478,[620]6.3331,[621]6.3260,[622]6.3239,[623]6.3254,[624]6.3260,[625]6.3260,[626]6.3248,[627]6.3273,[628]6.3271,[629]6.3268,[630]6.3300,[631]6.3358,[632]6.3411,[633]6.3395,[634]6.3428,[635]6.3431,[636]6.3407,[637]6.3375,[638]6.3405,[639]6.3373,[640]6.3381,[641]6.3380,[642]6.3446,[643]6.3467,[644]6.3480,[645]6.3463,[646]6.3508,[647]6.3474,[648]6.3484,[649]6.3487,[650]6.3527,[651]6.3582,[652]6.3594,[653]6.3633,[654]6.3566,[655]6.3559,
llama_print_timings: load time = 13794.33 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 4708961.29 ms / 335360 tokens ( 14.04 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 4740083.96 ms
Q4_4, 7B
main: seed = 1682662628
llama.cpp: loading model from ../models/7B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 6079.65 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.66 seconds per pass - ETA 18 minutes
[1]4.4671,[2]4.9332,[3]5.7966,[4]6.3903,[5]6.4889,[6]6.4445,[7]6.6354,[8]6.7448,[9]7.1008,[10]7.3327,[11]7.5469,[12]7.5702,[13]7.4939,[14]7.5431,[15]7.8020,[16]7.4115,[17]7.2997,[18]7.2633,[19]6.8989,[20]6.8895,[21]6.7942,[22]6.6200,[23]6.5946,[24]6.4943,[25]6.4910,[26]6.3306,[27]6.1518,[28]6.0550,[29]5.9628,[30]5.7997,[31]5.7669,[32]5.7880,[33]5.7247,[34]5.7606,[35]5.7797,[36]5.8213,[37]5.8266,[38]5.8396,[39]5.8747,[40]5.9274,[41]5.9369,[42]5.9771,[43]5.9365,[44]5.9939,[45]5.9977,[46]5.9743,[47]5.9951,[48]5.9683,[49]5.9721,[50]5.9325,[51]5.9288,[52]5.9188,[53]5.9631,[54]5.9476,[55]5.9211,[56]5.9503,[57]5.9739,[58]5.9943,[59]6.0098,[60]6.0524,[61]6.0460,[62]6.1021,[63]6.1367,[64]6.1507,[65]6.1955,[66]6.2021,[67]6.2187,[68]6.2364,[69]6.2616,[70]6.2919,[71]6.3106,[72]6.3415,[73]6.4011,[74]6.4065,[75]6.4199,[76]6.4314,[77]6.4421,[78]6.4259,[79]6.4530,[80]6.4448,[81]6.4560,[82]6.4602,[83]6.4082,[84]6.3919,[85]6.3798,[86]6.3574,[87]6.2929,[88]6.2639,[89]6.2441,[90]6.2288,[91]6.2508,[92]6.2444,[93]6.2448,[94]6.2434,[95]6.2717,[96]6.2711,[97]6.2662,[98]6.2604,[99]6.2463,[100]6.2476,[101]6.2717,[102]6.2658,[103]6.2852,[104]6.2931,[105]6.2930,[106]6.3089,[107]6.3065,[108]6.3193,[109]6.3124,[110]6.3074,[111]6.3307,[112]6.3508,[113]6.3523,[114]6.3484,[115]6.3549,[116]6.3466,[117]6.3512,[118]6.3800,[119]6.3996,[120]6.4352,[121]6.4512,[122]6.4768,[123]6.5140,[124]6.5320,[125]6.5220,[126]6.5619,[127]6.5989,[128]6.6290,[129]6.6127,[130]6.6231,[131]6.6199,[132]6.6103,[133]6.5962,[134]6.6053,[135]6.6013,[136]6.5899,[137]6.5826,[138]6.5660,[139]6.5555,[140]6.5498,[141]6.5198,[142]6.5166,[143]6.4876,[144]6.4677,[145]6.4586,[146]6.4460,[147]6.4515,[148]6.4512,[149]6.4450,[150]6.4413,[151]6.4425,[152]6.4321,[153]6.4147,[154]6.4056,[155]6.4131,[156]6.4087,[157]6.4255,[158]6.4298,[159]6.4351,[160]6.4369,[161]6.4483,[162]6.4189,[163]6.4069,[164]6.3823,[165]6.3515,[166]6.3244,[167]6.2874,[168]6.2551,[169]6.2412,[170]6.2300,[171]6.2027,[172]6.1858,[173]6.1688,[174]6.1387,[175]6.1174,[176]6.1069,[177]6.0861,[178]6.0630,[179]6.0461,[180]6.0368,[181]6.0151,[182]5.9970,[183]5.9832,[184]5.9832,[185]5.9755,[186]5.9762,[187]5.9828,[188]5.9785,[189]5.9951,[190]5.9961,[191]6.0184,[192]6.0350,[193]6.0522,[194]6.0635,[195]6.0853,[196]6.1015,[197]6.1229,[198]6.1381,[199]6.1415,[200]6.1468,[201]6.1411,[202]6.1608,[203]6.1688,[204]6.1675,[205]6.1781,[206]6.1847,[207]6.1806,[208]6.1893,[209]6.1939,[210]6.1997,[211]6.2106,[212]6.2183,[213]6.2291,[214]6.2316,[215]6.2350,[216]6.2496,[217]6.2681,[218]6.2813,[219]6.2813,[220]6.2777,[221]6.2732,[222]6.2704,[223]6.2607,[224]6.2532,[225]6.2496,[226]6.2706,[227]6.2796,[228]6.2846,[229]6.2911,[230]6.2883,[231]6.3053,[232]6.2927,[233]6.2763,[234]6.2615,[235]6.2437,[236]6.2364,[237]6.2263,[238]6.2291,[239]6.2136,[240]6.2034,[241]6.2061,[242]6.2098,[243]6.2076,[244]6.1963,[245]6.1933,[246]6.1819,[247]6.1698,[248]6.1622,[249]6.1601,[250]6.1644,[251]6.1573,[252]6.1532,[253]6.1435,[254]6.1388,[255]6.1269,[256]6.1089,[257]6.0971,[258]6.0891,[259]6.0872,[260]6.0795,[261]6.0754,[262]6.0699,[263]6.0643,[264]6.0427,[265]6.0420,[266]6.0407,[267]6.0339,[268]6.0434,[269]6.0410,[270]6.0421,[271]6.0497,[272]6.0532,[273]6.0534,[274]6.0557,[275]6.0640,[276]6.0700,[277]6.0856,[278]6.0958,[279]6.1049,[280]6.1076,[281]6.1168,[282]6.1228,[283]6.1381,[284]6.1461,[285]6.1548,[286]6.1684,[287]6.1687,[288]6.1747,[289]6.1662,[290]6.1505,[291]6.1353,[292]6.1199,[293]6.1070,[294]6.1090,[295]6.1082,[296]6.1124,[297]6.1110,[298]6.1144,[299]6.1116,[300]6.1005,[301]6.1005,[302]6.0925,[303]6.0832,[304]6.0749,[305]6.0724,[306]6.0599,[307]6.0621,[308]6.0658,[309]6.0495,[310]6.0435,[311]6.0375,[312]6.0401,[313]6.0346,[314]6.0328,[315]6.0165,[316]6.0113,[317]5.9953,[318]5.9745,[319]5.9864,[320]5.9991,[321]6.0034,[322]5.9992,[323]5.9924,[324]5.9893,[325]5.9993,[326]5.9989,[327]6.0011,[328]6.0052,[329]6.0116,[330]6.0142,[331]6.0267,[332]6.0234,[333]6.0306,[334]6.0251,[335]6.0182,[336]6.0215,[337]6.0188,[338]6.0183,[339]6.0132,[340]6.0088,[341]6.0169,[342]6.0194,[343]6.0240,[344]6.0238,[345]6.0243,[346]6.0216,[347]6.0255,[348]6.0282,[349]6.0300,[350]6.0266,[351]6.0272,[352]6.0275,[353]6.0218,[354]6.0218,[355]6.0268,[356]6.0295,[357]6.0262,[358]6.0353,[359]6.0384,[360]6.0350,[361]6.0349,[362]6.0416,[363]6.0531,[364]6.0599,[365]6.0656,[366]6.0665,[367]6.0749,[368]6.0722,[369]6.0729,[370]6.0744,[371]6.0687,[372]6.0733,[373]6.0784,[374]6.0768,[375]6.0770,[376]6.0837,[377]6.0790,[378]6.0816,[379]6.0876,[380]6.0798,[381]6.0762,[382]6.0711,[383]6.0704,[384]6.0698,[385]6.0690,[386]6.0684,[387]6.0682,[388]6.0644,[389]6.0591,[390]6.0524,[391]6.0446,[392]6.0405,[393]6.0391,[394]6.0420,[395]6.0406,[396]6.0330,[397]6.0401,[398]6.0440,[399]6.0523,[400]6.0522,[401]6.0539,[402]6.0546,[403]6.0566,[404]6.0632,[405]6.0534,[406]6.0498,[407]6.0490,[408]6.0505,[409]6.0626,[410]6.0736,[411]6.0848,[412]6.1006,[413]6.1118,[414]6.1195,[415]6.1248,[416]6.1322,[417]6.1444,[418]6.1483,[419]6.1556,[420]6.1642,[421]6.1757,[422]6.1804,[423]6.1873,[424]6.1986,[425]6.2072,[426]6.2136,[427]6.2181,[428]6.2262,[429]6.2315,[430]6.2398,[431]6.2542,[432]6.2585,[433]6.2580,[434]6.2538,[435]6.2546,[436]6.2568,[437]6.2665,[438]6.2738,[439]6.2713,[440]6.2704,[441]6.2652,[442]6.2637,[443]6.2651,[444]6.2653,[445]6.2633,[446]6.2660,[447]6.2689,[448]6.2734,[449]6.2704,[450]6.2717,[451]6.2676,[452]6.2548,[453]6.2464,[454]6.2409,[455]6.2419,[456]6.2463,[457]6.2483,[458]6.2461,[459]6.2467,[460]6.2551,[461]6.2525,[462]6.2511,[463]6.2558,[464]6.2547,[465]6.2519,[466]6.2441,[467]6.2441,[468]6.2439,[469]6.2458,[470]6.2462,[471]6.2412,[472]6.2457,[473]6.2402,[474]6.2412,[475]6.2353,[476]6.2371,[477]6.2300,[478]6.2289,[479]6.2348,[480]6.2394,[481]6.2412,[482]6.2367,[483]6.2323,[484]6.2343,[485]6.2332,[486]6.2278,[487]6.2278,[488]6.2258,[489]6.2209,[490]6.2185,[491]6.2154,[492]6.2094,[493]6.2065,[494]6.2051,[495]6.2051,[496]6.2017,[497]6.1960,[498]6.1943,[499]6.1898,[500]6.1803,[501]6.1738,[502]6.1741,[503]6.1733,[504]6.1644,[505]6.1673,[506]6.1681,[507]6.1625,[508]6.1586,[509]6.1579,[510]6.1617,[511]6.1661,[512]6.1694,[513]6.1715,[514]6.1779,[515]6.1723,[516]6.1713,[517]6.1722,[518]6.1721,[519]6.1750,[520]6.1777,[521]6.1792,[522]6.1821,[523]6.1827,[524]6.1884,[525]6.1919,[526]6.1931,[527]6.1949,[528]6.1897,[529]6.1904,[530]6.1854,[531]6.1840,[532]6.1885,[533]6.1907,[534]6.1894,[535]6.1918,[536]6.1864,[537]6.1840,[538]6.1889,[539]6.1900,[540]6.1936,[541]6.1938,[542]6.1950,[543]6.1967,[544]6.1976,[545]6.1955,[546]6.1963,[547]6.1920,[548]6.1871,[549]6.1872,[550]6.1841,[551]6.1805,[552]6.1783,[553]6.1745,[554]6.1723,[555]6.1694,[556]6.1691,[557]6.1713,[558]6.1675,[559]6.1669,[560]6.1667,[561]6.1668,[562]6.1641,[563]6.1641,[564]6.1682,[565]6.1701,[566]6.1698,[567]6.1678,[568]6.1683,[569]6.1668,[570]6.1696,[571]6.1702,[572]6.1711,[573]6.1712,[574]6.1677,[575]6.1675,[576]6.1675,[577]6.1662,[578]6.1642,[579]6.1649,[580]6.1582,[581]6.1544,[582]6.1534,[583]6.1543,[584]6.1544,[585]6.1467,[586]6.1399,[587]6.1404,[588]6.1449,[589]6.1505,[590]6.1536,[591]6.1558,[592]6.1545,[593]6.1514,[594]6.1523,[595]6.1500,[596]6.1535,[597]6.1513,[598]6.1484,[599]6.1506,[600]6.1502,[601]6.1486,[602]6.1500,[603]6.1533,[604]6.1542,[605]6.1574,[606]6.1593,[607]6.1577,[608]6.1546,[609]6.1551,[610]6.1587,[611]6.1569,[612]6.1595,[613]6.1557,[614]6.1506,[615]6.1432,[616]6.1462,[617]6.1402,[618]6.1353,[619]6.1297,[620]6.1158,[621]6.1088,[622]6.1073,[623]6.1087,[624]6.1091,[625]6.1091,[626]6.1078,[627]6.1098,[628]6.1099,[629]6.1096,[630]6.1128,[631]6.1183,[632]6.1237,[633]6.1221,[634]6.1256,[635]6.1265,[636]6.1232,[637]6.1200,[638]6.1227,[639]6.1197,[640]6.1206,[641]6.1210,[642]6.1278,[643]6.1300,[644]6.1312,[645]6.1292,[646]6.1331,[647]6.1294,[648]6.1302,[649]6.1303,[650]6.1343,[651]6.1398,[652]6.1408,[653]6.1448,[654]6.1384,[655]6.1378,
llama_print_timings: load time = 2868.41 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 986629.03 ms / 335360 tokens ( 2.94 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1016443.58 ms
Q2_4, 13B
main: seed = 1682672513
llama.cpp: loading model from ../models/13B/q24.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 15 (mostly Q2_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 7149.75 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.46 seconds per pass - ETA 26 minutes
[1]4.7585,[2]5.3449,[3]6.1921,[4]6.9859,[5]7.0915,[6]6.9582,[7]7.1905,[8]7.3194,[9]7.6847,[10]7.9856,[11]8.2037,[12]8.2346,[13]8.2531,[14]8.4211,[15]8.6654,[16]8.1885,[17]8.0491,[18]8.0571,[19]7.6317,[20]7.5630,[21]7.4568,[22]7.2688,[23]7.2103,[24]7.1009,[25]7.0970,[26]6.8966,[27]6.6696,[28]6.5677,[29]6.4672,[30]6.2927,[31]6.2522,[32]6.2650,[33]6.2214,[34]6.2911,[35]6.3188,[36]6.3626,[37]6.3676,[38]6.3648,[39]6.4049,[40]6.4696,[41]6.5031,[42]6.5451,[43]6.4929,[44]6.5349,[45]6.5355,[46]6.4898,[47]6.5203,[48]6.4929,[49]6.5029,[50]6.4658,[51]6.4713,[52]6.4578,[53]6.5058,[54]6.4917,[55]6.4656,[56]6.4997,[57]6.5208,[58]6.5559,[59]6.5796,[60]6.6247,[61]6.6097,[62]6.6783,[63]6.7118,[64]6.7189,[65]6.7631,[66]6.7637,[67]6.7817,[68]6.7975,[69]6.8369,[70]6.8757,[71]6.9031,[72]6.9442,[73]7.0056,[74]7.0067,[75]7.0174,[76]7.0390,[77]7.0566,[78]7.0475,[79]7.0756,[80]7.0690,[81]7.0858,[82]7.0833,[83]7.0255,[84]7.0156,[85]7.0123,[86]6.9888,[87]6.9278,[88]6.8928,[89]6.8682,[90]6.8598,[91]6.8908,[92]6.8824,[93]6.8846,[94]6.8808,[95]6.9110,[96]6.9089,[97]6.9063,[98]6.8987,[99]6.8890,[100]6.8806,[101]6.9075,[102]6.8944,[103]6.9125,[104]6.9135,[105]6.9150,[106]6.9329,[107]6.9309,[108]6.9469,[109]6.9416,[110]6.9362,[111]6.9575,[112]6.9752,[113]6.9789,[114]6.9764,[115]6.9810,[116]6.9712,[117]6.9763,[118]7.0062,[119]7.0290,[120]7.0601,[121]7.0781,[122]7.1003,[123]7.1417,[124]7.1628,[125]7.1536,[126]7.1928,[127]7.2300,[128]7.2598,[129]7.2410,[130]7.2510,[131]7.2454,[132]7.2380,[133]7.2288,[134]7.2421,[135]7.2391,[136]7.2287,[137]7.2241,[138]7.2107,[139]7.2015,[140]7.2012,[141]7.1776,[142]7.1732,[143]7.1527,[144]7.1378,[145]7.1344,[146]7.1172,[147]7.1279,[148]7.1340,[149]7.1299,[150]7.1284,[151]7.1312,[152]7.1194,[153]7.1053,[154]7.0951,[155]7.1006,[156]7.0991,[157]7.1170,[158]7.1222,[159]7.1258,[160]7.1297,[161]7.1436,[162]7.1072,[163]7.0966,[164]7.0683,[165]7.0330,[166]6.9992,[167]6.9570,[168]6.9234,[169]6.9088,[170]6.8939,[171]6.8657,[172]6.8461,[173]6.8297,[174]6.7949,[175]6.7696,[176]6.7531,[177]6.7300,[178]6.7037,[179]6.6849,[180]6.6748,[181]6.6528,[182]6.6307,[183]6.6164,[184]6.6141,[185]6.6069,[186]6.6110,[187]6.6160,[188]6.6144,[189]6.6356,[190]6.6369,[191]6.6560,[192]6.6695,[193]6.6896,[194]6.7040,[195]6.7267,[196]6.7426,[197]6.7660,[198]6.7803,[199]6.7816,[200]6.7834,[201]6.7782,[202]6.8005,[203]6.8093,[204]6.8139,[205]6.8266,[206]6.8314,[207]6.8280,[208]6.8349,[209]6.8376,[210]6.8433,[211]6.8518,[212]6.8573,[213]6.8660,[214]6.8694,[215]6.8724,[216]6.8841,[217]6.9016,[218]6.9167,[219]6.9157,[220]6.9105,[221]6.9024,[222]6.9009,[223]6.8909,[224]6.8810,[225]6.8771,[226]6.8996,[227]6.9145,[228]6.9243,[229]6.9329,[230]6.9299,[231]6.9460,[232]6.9357,[233]6.9161,[234]6.8990,[235]6.8808,[236]6.8716,[237]6.8604,[238]6.8644,[239]6.8478,[240]6.8351,[241]6.8392,[242]6.8429,[243]6.8411,[244]6.8283,[245]6.8257,[246]6.8136,[247]6.8008,[248]6.7913,[249]6.7872,[250]6.7898,[251]6.7811,[252]6.7757,[253]6.7633,[254]6.7598,[255]6.7463,[256]6.7255,[257]6.7133,[258]6.7034,[259]6.7016,[260]6.6927,[261]6.6882,[262]6.6811,[263]6.6731,[264]6.6577,[265]6.6581,[266]6.6546,[267]6.6455,[268]6.6556,[269]6.6559,[270]6.6559,[271]6.6630,[272]6.6677,[273]6.6665,[274]6.6678,[275]6.6755,[276]6.6830,[277]6.7003,[278]6.7100,[279]6.7188,[280]6.7223,[281]6.7342,[282]6.7384,[283]6.7524,[284]6.7615,[285]6.7707,[286]6.7866,[287]6.7841,[288]6.7917,[289]6.7842,[290]6.7667,[291]6.7510,[292]6.7324,[293]6.7158,[294]6.7170,[295]6.7174,[296]6.7228,[297]6.7217,[298]6.7237,[299]6.7202,[300]6.7097,[301]6.7079,[302]6.6990,[303]6.6904,[304]6.6805,[305]6.6760,[306]6.6626,[307]6.6652,[308]6.6659,[309]6.6506,[310]6.6450,[311]6.6401,[312]6.6418,[313]6.6342,[314]6.6339,[315]6.6170,[316]6.6160,[317]6.6004,[318]6.5804,[319]6.5945,[320]6.6077,[321]6.6118,[322]6.6055,[323]6.5989,[324]6.5969,[325]6.6085,[326]6.6091,[327]6.6111,[328]6.6139,[329]6.6186,[330]6.6225,[331]6.6357,[332]6.6312,[333]6.6398,[334]6.6321,[335]6.6254,[336]6.6294,[337]6.6266,[338]6.6272,[339]6.6223,[340]6.6194,[341]6.6274,[342]6.6307,[343]6.6368,[344]6.6358,[345]6.6355,[346]6.6318,[347]6.6356,[348]6.6404,[349]6.6428,[350]6.6401,[351]6.6418,[352]6.6438,[353]6.6379,[354]6.6392,[355]6.6448,[356]6.6477,[357]6.6436,[358]6.6529,[359]6.6557,[360]6.6506,[361]6.6486,[362]6.6565,[363]6.6678,[364]6.6738,[365]6.6800,[366]6.6819,[367]6.6929,[368]6.6892,[369]6.6907,[370]6.6922,[371]6.6857,[372]6.6918,[373]6.6976,[374]6.6955,[375]6.6940,[376]6.7022,[377]6.6962,[378]6.6974,[379]6.7035,[380]6.6939,[381]6.6905,[382]6.6856,[383]6.6836,[384]6.6840,[385]6.6822,[386]6.6813,[387]6.6816,[388]6.6758,[389]6.6701,[390]6.6638,[391]6.6557,[392]6.6529,[393]6.6534,[394]6.6558,[395]6.6539,[396]6.6467,[397]6.6557,[398]6.6598,[399]6.6702,[400]6.6694,[401]6.6701,[402]6.6706,[403]6.6732,[404]6.6793,[405]6.6683,[406]6.6646,[407]6.6647,[408]6.6652,[409]6.6781,[410]6.6897,[411]6.7017,[412]6.7187,[413]6.7319,[414]6.7406,[415]6.7476,[416]6.7561,[417]6.7672,[418]6.7697,[419]6.7761,[420]6.7854,[421]6.7970,[422]6.8007,[423]6.8079,[424]6.8207,[425]6.8308,[426]6.8387,[427]6.8423,[428]6.8514,[429]6.8557,[430]6.8635,[431]6.8784,[432]6.8802,[433]6.8788,[434]6.8730,[435]6.8730,[436]6.8755,[437]6.8857,[438]6.8954,[439]6.8912,[440]6.8895,[441]6.8831,[442]6.8802,[443]6.8809,[444]6.8829,[445]6.8802,[446]6.8811,[447]6.8832,[448]6.8871,[449]6.8847,[450]6.8841,[451]6.8796,[452]6.8733,[453]6.8643,[454]6.8579,[455]6.8578,[456]6.8623,[457]6.8644,[458]6.8620,[459]6.8618,[460]6.8698,[461]6.8655,[462]6.8631,[463]6.8662,[464]6.8657,[465]6.8642,[466]6.8564,[467]6.8586,[468]6.8591,[469]6.8619,[470]6.8626,[471]6.8578,[472]6.8629,[473]6.8565,[474]6.8588,[475]6.8554,[476]6.8561,[477]6.8481,[478]6.8469,[479]6.8548,[480]6.8607,[481]6.8621,[482]6.8569,[483]6.8536,[484]6.8568,[485]6.8558,[486]6.8488,[487]6.8492,[488]6.8468,[489]6.8411,[490]6.8390,[491]6.8360,[492]6.8294,[493]6.8257,[494]6.8236,[495]6.8228,[496]6.8191,[497]6.8134,[498]6.8116,[499]6.8061,[500]6.7967,[501]6.7880,[502]6.7892,[503]6.7873,[504]6.7777,[505]6.7790,[506]6.7806,[507]6.7771,[508]6.7734,[509]6.7717,[510]6.7750,[511]6.7811,[512]6.7847,[513]6.7874,[514]6.7947,[515]6.7885,[516]6.7874,[517]6.7884,[518]6.7876,[519]6.7900,[520]6.7920,[521]6.7937,[522]6.7953,[523]6.7954,[524]6.8017,[525]6.8047,[526]6.8057,[527]6.8079,[528]6.8026,[529]6.8051,[530]6.7994,[531]6.7978,[532]6.8046,[533]6.8084,[534]6.8061,[535]6.8101,[536]6.8043,[537]6.8016,[538]6.8073,[539]6.8080,[540]6.8123,[541]6.8142,[542]6.8147,[543]6.8169,[544]6.8180,[545]6.8166,[546]6.8171,[547]6.8123,[548]6.8059,[549]6.8058,[550]6.8032,[551]6.7988,[552]6.7972,[553]6.7924,[554]6.7894,[555]6.7866,[556]6.7858,[557]6.7887,[558]6.7852,[559]6.7856,[560]6.7835,[561]6.7842,[562]6.7818,[563]6.7808,[564]6.7860,[565]6.7882,[566]6.7879,[567]6.7853,[568]6.7853,[569]6.7822,[570]6.7854,[571]6.7860,[572]6.7865,[573]6.7865,[574]6.7829,[575]6.7816,[576]6.7811,[577]6.7778,[578]6.7755,[579]6.7751,[580]6.7677,[581]6.7637,[582]6.7634,[583]6.7635,[584]6.7630,[585]6.7563,[586]6.7496,[587]6.7503,[588]6.7555,[589]6.7622,[590]6.7651,[591]6.7657,[592]6.7647,[593]6.7611,[594]6.7620,[595]6.7595,[596]6.7636,[597]6.7606,[598]6.7572,[599]6.7593,[600]6.7579,[601]6.7563,[602]6.7597,[603]6.7628,[604]6.7645,[605]6.7670,[606]6.7681,[607]6.7671,[608]6.7631,[609]6.7631,[610]6.7687,[611]6.7669,[612]6.7689,[613]6.7652,[614]6.7594,[615]6.7506,[616]6.7538,[617]6.7459,[618]6.7394,[619]6.7332,[620]6.7181,[621]6.7112,[622]6.7089,[623]6.7105,[624]6.7107,[625]6.7115,[626]6.7104,[627]6.7136,[628]6.7136,[629]6.7137,[630]6.7170,[631]6.7235,[632]6.7290,[633]6.7270,[634]6.7302,[635]6.7294,[636]6.7265,[637]6.7236,[638]6.7265,[639]6.7229,[640]6.7237,[641]6.7239,[642]6.7309,[643]6.7328,[644]6.7346,[645]6.7328,[646]6.7373,[647]6.7336,[648]6.7350,[649]6.7353,[650]6.7393,[651]6.7443,[652]6.7446,[653]6.7485,[654]6.7419,[655]6.7409,
llama_print_timings: load time = 4488.53 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1516595.65 ms / 335360 tokens ( 4.52 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1547168.54 ms
Q3_4, 13B
main: seed = 1682656187
llama.cpp: loading model from ../models/13B/q34.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 10 (mostly Q3_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 8681.78 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.75 seconds per pass - ETA 30 minutes
[1]3.9691,[2]4.3609,[3]5.1508,[4]5.6357,[5]5.8245,[6]5.7682,[7]5.8930,[8]6.0160,[9]6.2782,[10]6.4992,[11]6.7043,[12]6.7592,[13]6.7085,[14]6.8204,[15]7.0162,[16]6.6686,[17]6.5717,[18]6.5474,[19]6.2354,[20]6.1972,[21]6.1250,[22]5.9506,[23]5.9279,[24]5.8369,[25]5.8490,[26]5.6949,[27]5.5136,[28]5.4184,[29]5.3375,[30]5.1977,[31]5.1640,[32]5.1797,[33]5.1311,[34]5.1711,[35]5.1947,[36]5.2175,[37]5.2144,[38]5.2114,[39]5.2423,[40]5.2872,[41]5.3119,[42]5.3487,[43]5.3108,[44]5.3543,[45]5.3571,[46]5.3300,[47]5.3585,[48]5.3381,[49]5.3477,[50]5.3142,[51]5.3187,[52]5.3117,[53]5.3565,[54]5.3451,[55]5.3237,[56]5.3481,[57]5.3650,[58]5.3886,[59]5.4044,[60]5.4383,[61]5.4305,[62]5.4858,[63]5.5112,[64]5.5220,[65]5.5596,[66]5.5598,[67]5.5773,[68]5.5892,[69]5.6185,[70]5.6480,[71]5.6698,[72]5.7054,[73]5.7564,[74]5.7630,[75]5.7734,[76]5.7888,[77]5.8019,[78]5.7873,[79]5.8141,[80]5.8085,[81]5.8180,[82]5.8151,[83]5.7677,[84]5.7567,[85]5.7513,[86]5.7341,[87]5.6712,[88]5.6305,[89]5.6089,[90]5.5978,[91]5.6195,[92]5.6135,[93]5.6146,[94]5.6130,[95]5.6419,[96]5.6394,[97]5.6359,[98]5.6316,[99]5.6235,[100]5.6211,[101]5.6452,[102]5.6405,[103]5.6557,[104]5.6603,[105]5.6634,[106]5.6775,[107]5.6761,[108]5.6907,[109]5.6889,[110]5.6830,[111]5.7020,[112]5.7191,[113]5.7178,[114]5.7158,[115]5.7198,[116]5.7080,[117]5.7090,[118]5.7330,[119]5.7506,[120]5.7808,[121]5.7966,[122]5.8185,[123]5.8563,[124]5.8753,[125]5.8695,[126]5.9059,[127]5.9397,[128]5.9679,[129]5.9551,[130]5.9628,[131]5.9577,[132]5.9536,[133]5.9413,[134]5.9490,[135]5.9496,[136]5.9397,[137]5.9349,[138]5.9208,[139]5.9126,[140]5.9112,[141]5.8840,[142]5.8817,[143]5.8558,[144]5.8403,[145]5.8328,[146]5.8204,[147]5.8257,[148]5.8279,[149]5.8249,[150]5.8232,[151]5.8275,[152]5.8208,[153]5.8108,[154]5.8054,[155]5.8122,[156]5.8106,[157]5.8269,[158]5.8289,[159]5.8305,[160]5.8342,[161]5.8453,[162]5.8188,[163]5.8082,[164]5.7861,[165]5.7595,[166]5.7354,[167]5.7028,[168]5.6739,[169]5.6606,[170]5.6517,[171]5.6296,[172]5.6167,[173]5.6035,[174]5.5753,[175]5.5553,[176]5.5420,[177]5.5254,[178]5.5038,[179]5.4904,[180]5.4827,[181]5.4657,[182]5.4485,[183]5.4360,[184]5.4348,[185]5.4271,[186]5.4282,[187]5.4332,[188]5.4297,[189]5.4473,[190]5.4471,[191]5.4648,[192]5.4785,[193]5.4948,[194]5.5065,[195]5.5265,[196]5.5385,[197]5.5577,[198]5.5713,[199]5.5728,[200]5.5747,[201]5.5688,[202]5.5830,[203]5.5901,[204]5.5849,[205]5.5946,[206]5.5999,[207]5.5954,[208]5.6013,[209]5.6056,[210]5.6114,[211]5.6216,[212]5.6280,[213]5.6369,[214]5.6402,[215]5.6434,[216]5.6548,[217]5.6719,[218]5.6855,[219]5.6862,[220]5.6827,[221]5.6775,[222]5.6773,[223]5.6701,[224]5.6627,[225]5.6593,[226]5.6796,[227]5.6871,[228]5.6945,[229]5.7014,[230]5.6982,[231]5.7139,[232]5.7031,[233]5.6877,[234]5.6733,[235]5.6521,[236]5.6465,[237]5.6370,[238]5.6397,[239]5.6280,[240]5.6186,[241]5.6219,[242]5.6239,[243]5.6226,[244]5.6122,[245]5.6090,[246]5.5987,[247]5.5889,[248]5.5823,[249]5.5786,[250]5.5821,[251]5.5736,[252]5.5693,[253]5.5598,[254]5.5562,[255]5.5464,[256]5.5294,[257]5.5194,[258]5.5122,[259]5.5121,[260]5.5040,[261]5.4993,[262]5.4950,[263]5.4898,[264]5.4684,[265]5.4679,[266]5.4647,[267]5.4583,[268]5.4651,[269]5.4649,[270]5.4662,[271]5.4723,[272]5.4756,[273]5.4771,[274]5.4788,[275]5.4854,[276]5.4918,[277]5.5050,[278]5.5138,[279]5.5224,[280]5.5257,[281]5.5354,[282]5.5409,[283]5.5541,[284]5.5634,[285]5.5703,[286]5.5834,[287]5.5802,[288]5.5859,[289]5.5799,[290]5.5657,[291]5.5528,[292]5.5389,[293]5.5267,[294]5.5278,[295]5.5279,[296]5.5330,[297]5.5319,[298]5.5333,[299]5.5313,[300]5.5220,[301]5.5223,[302]5.5154,[303]5.5067,[304]5.4995,[305]5.4974,[306]5.4861,[307]5.4896,[308]5.4904,[309]5.4765,[310]5.4726,[311]5.4684,[312]5.4703,[313]5.4646,[314]5.4629,[315]5.4493,[316]5.4465,[317]5.4335,[318]5.4164,[319]5.4272,[320]5.4385,[321]5.4432,[322]5.4399,[323]5.4338,[324]5.4321,[325]5.4421,[326]5.4438,[327]5.4446,[328]5.4478,[329]5.4528,[330]5.4549,[331]5.4651,[332]5.4612,[333]5.4687,[334]5.4637,[335]5.4582,[336]5.4602,[337]5.4589,[338]5.4584,[339]5.4544,[340]5.4515,[341]5.4582,[342]5.4613,[343]5.4661,[344]5.4667,[345]5.4682,[346]5.4668,[347]5.4704,[348]5.4741,[349]5.4761,[350]5.4743,[351]5.4755,[352]5.4756,[353]5.4707,[354]5.4717,[355]5.4764,[356]5.4792,[357]5.4759,[358]5.4840,[359]5.4864,[360]5.4827,[361]5.4826,[362]5.4892,[363]5.5000,[364]5.5057,[365]5.5100,[366]5.5115,[367]5.5204,[368]5.5175,[369]5.5187,[370]5.5204,[371]5.5161,[372]5.5205,[373]5.5250,[374]5.5229,[375]5.5222,[376]5.5284,[377]5.5247,[378]5.5269,[379]5.5311,[380]5.5240,[381]5.5207,[382]5.5167,[383]5.5148,[384]5.5146,[385]5.5136,[386]5.5127,[387]5.5122,[388]5.5088,[389]5.5049,[390]5.4993,[391]5.4931,[392]5.4895,[393]5.4891,[394]5.4920,[395]5.4913,[396]5.4857,[397]5.4919,[398]5.4961,[399]5.5033,[400]5.5021,[401]5.5026,[402]5.5038,[403]5.5060,[404]5.5116,[405]5.4965,[406]5.4924,[407]5.4922,[408]5.4934,[409]5.5046,[410]5.5136,[411]5.5238,[412]5.5381,[413]5.5484,[414]5.5549,[415]5.5611,[416]5.5685,[417]5.5786,[418]5.5810,[419]5.5860,[420]5.5939,[421]5.6041,[422]5.6074,[423]5.6130,[424]5.6227,[425]5.6304,[426]5.6365,[427]5.6407,[428]5.6480,[429]5.6515,[430]5.6580,[431]5.6710,[432]5.6739,[433]5.6728,[434]5.6689,[435]5.6702,[436]5.6729,[437]5.6814,[438]5.6891,[439]5.6861,[440]5.6852,[441]5.6803,[442]5.6787,[443]5.6799,[444]5.6813,[445]5.6802,[446]5.6823,[447]5.6846,[448]5.6880,[449]5.6865,[450]5.6875,[451]5.6846,[452]5.6697,[453]5.6600,[454]5.6546,[455]5.6552,[456]5.6593,[457]5.6607,[458]5.6587,[459]5.6584,[460]5.6658,[461]5.6619,[462]5.6586,[463]5.6573,[464]5.6571,[465]5.6549,[466]5.6477,[467]5.6467,[468]5.6446,[469]5.6459,[470]5.6450,[471]5.6401,[472]5.6416,[473]5.6368,[474]5.6356,[475]5.6290,[476]5.6277,[477]5.6199,[478]5.6175,[479]5.6193,[480]5.6221,[481]5.6226,[482]5.6179,[483]5.6137,[484]5.6148,[485]5.6092,[486]5.6025,[487]5.6015,[488]5.5988,[489]5.5934,[490]5.5903,[491]5.5869,[492]5.5803,[493]5.5774,[494]5.5757,[495]5.5735,[496]5.5693,[497]5.5634,[498]5.5607,[499]5.5572,[500]5.5488,[501]5.5419,[502]5.5411,[503]5.5401,[504]5.5326,[505]5.5329,[506]5.5339,[507]5.5286,[508]5.5248,[509]5.5250,[510]5.5271,[511]5.5315,[512]5.5355,[513]5.5382,[514]5.5438,[515]5.5399,[516]5.5387,[517]5.5387,[518]5.5384,[519]5.5406,[520]5.5419,[521]5.5431,[522]5.5447,[523]5.5454,[524]5.5508,[525]5.5537,[526]5.5542,[527]5.5559,[528]5.5503,[529]5.5514,[530]5.5475,[531]5.5468,[532]5.5516,[533]5.5544,[534]5.5528,[535]5.5551,[536]5.5504,[537]5.5484,[538]5.5534,[539]5.5542,[540]5.5561,[541]5.5560,[542]5.5574,[543]5.5592,[544]5.5606,[545]5.5593,[546]5.5596,[547]5.5561,[548]5.5520,[549]5.5521,[550]5.5498,[551]5.5469,[552]5.5450,[553]5.5416,[554]5.5393,[555]5.5371,[556]5.5361,[557]5.5376,[558]5.5341,[559]5.5346,[560]5.5337,[561]5.5339,[562]5.5311,[563]5.5311,[564]5.5352,[565]5.5365,[566]5.5368,[567]5.5350,[568]5.5358,[569]5.5341,[570]5.5368,[571]5.5380,[572]5.5386,[573]5.5391,[574]5.5360,[575]5.5347,[576]5.5345,[577]5.5326,[578]5.5307,[579]5.5309,[580]5.5253,[581]5.5223,[582]5.5225,[583]5.5233,[584]5.5234,[585]5.5177,[586]5.5119,[587]5.5122,[588]5.5165,[589]5.5217,[590]5.5246,[591]5.5262,[592]5.5250,[593]5.5212,[594]5.5226,[595]5.5208,[596]5.5247,[597]5.5228,[598]5.5199,[599]5.5227,[600]5.5216,[601]5.5204,[602]5.5213,[603]5.5241,[604]5.5249,[605]5.5277,[606]5.5292,[607]5.5277,[608]5.5247,[609]5.5255,[610]5.5296,[611]5.5285,[612]5.5303,[613]5.5274,[614]5.5234,[615]5.5171,[616]5.5197,[617]5.5144,[618]5.5095,[619]5.5049,[620]5.4935,[621]5.4880,[622]5.4859,[623]5.4872,[624]5.4875,[625]5.4881,[626]5.4876,[627]5.4904,[628]5.4910,[629]5.4916,[630]5.4944,[631]5.4990,[632]5.5038,[633]5.5026,[634]5.5056,[635]5.5053,[636]5.5018,[637]5.4981,[638]5.5003,[639]5.4969,[640]5.4974,[641]5.4977,[642]5.5030,[643]5.5049,[644]5.5073,[645]5.5057,[646]5.5094,[647]5.5046,[648]5.5059,[649]5.5062,[650]5.5095,[651]5.5135,[652]5.5138,[653]5.5176,[654]5.5119,[655]5.5110,
llama_print_timings: load time = 5957.47 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1701223.56 ms / 335360 tokens ( 5.07 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1733067.74 ms
Q4_4, 13B
main: seed = 1682790225
llama.cpp: loading model from ../models/13B/q44.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 14 (mostly Q4_4)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 10213.81 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 400.00 MB
system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.72 seconds per pass - ETA 29 minutes
[1]3.7371,[2]4.2097,[3]5.0064,[4]5.3764,[5]5.5586,[6]5.4949,[7]5.6305,[8]5.7337,[9]5.9985,[10]6.2164,[11]6.4018,[12]6.4542,[13]6.4223,[14]6.5122,[15]6.7109,[16]6.3997,[17]6.3208,[18]6.2951,[19]6.0050,[20]5.9874,[21]5.9108,[22]5.7379,[23]5.7112,[24]5.6188,[25]5.6312,[26]5.4855,[27]5.3108,[28]5.2153,[29]5.1408,[30]5.0041,[31]4.9644,[32]4.9804,[33]4.9376,[34]4.9779,[35]4.9952,[36]5.0181,[37]5.0117,[38]5.0095,[39]5.0362,[40]5.0770,[41]5.1000,[42]5.1361,[43]5.0994,[44]5.1415,[45]5.1435,[46]5.1173,[47]5.1457,[48]5.1302,[49]5.1325,[50]5.1018,[51]5.1096,[52]5.1021,[53]5.1478,[54]5.1379,[55]5.1189,[56]5.1389,[57]5.1573,[58]5.1792,[59]5.1969,[60]5.2336,[61]5.2273,[62]5.2826,[63]5.3079,[64]5.3183,[65]5.3554,[66]5.3534,[67]5.3714,[68]5.3824,[69]5.4101,[70]5.4406,[71]5.4614,[72]5.4958,[73]5.5438,[74]5.5508,[75]5.5605,[76]5.5748,[77]5.5862,[78]5.5737,[79]5.5997,[80]5.5943,[81]5.6018,[82]5.5978,[83]5.5517,[84]5.5406,[85]5.5338,[86]5.5187,[87]5.4533,[88]5.4109,[89]5.3896,[90]5.3801,[91]5.4009,[92]5.3968,[93]5.3979,[94]5.3971,[95]5.4232,[96]5.4202,[97]5.4171,[98]5.4136,[99]5.4063,[100]5.4035,[101]5.4261,[102]5.4222,[103]5.4380,[104]5.4423,[105]5.4440,[106]5.4578,[107]5.4561,[108]5.4715,[109]5.4708,[110]5.4650,[111]5.4836,[112]5.4995,[113]5.4999,[114]5.4986,[115]5.5033,[116]5.4916,[117]5.4909,[118]5.5141,[119]5.5324,[120]5.5620,[121]5.5776,[122]5.5990,[123]5.6352,[124]5.6525,[125]5.6471,[126]5.6825,[127]5.7151,[128]5.7425,[129]5.7312,[130]5.7397,[131]5.7352,[132]5.7318,[133]5.7202,[134]5.7284,[135]5.7279,[136]5.7190,[137]5.7155,[138]5.7017,[139]5.6938,[140]5.6922,[141]5.6652,[142]5.6613,[143]5.6361,[144]5.6211,[145]5.6124,[146]5.6018,[147]5.6069,[148]5.6100,[149]5.6069,[150]5.6060,[151]5.6105,[152]5.6049,[153]5.5951,[154]5.5893,[155]5.5957,[156]5.5933,[157]5.6083,[158]5.6106,[159]5.6112,[160]5.6147,[161]5.6256,[162]5.6003,[163]5.5907,[164]5.5704,[165]5.5452,[166]5.5225,[167]5.4905,[168]5.4638,[169]5.4505,[170]5.4419,[171]5.4213,[172]5.4093,[173]5.3965,[174]5.3701,[175]5.3500,[176]5.3366,[177]5.3201,[178]5.3003,[179]5.2871,[180]5.2796,[181]5.2638,[182]5.2477,[183]5.2357,[184]5.2347,[185]5.2275,[186]5.2280,[187]5.2337,[188]5.2312,[189]5.2476,[190]5.2479,[191]5.2649,[192]5.2784,[193]5.2932,[194]5.3039,[195]5.3229,[196]5.3343,[197]5.3533,[198]5.3668,[199]5.3688,[200]5.3693,[201]5.3628,[202]5.3753,[203]5.3811,[204]5.3766,[205]5.3852,[206]5.3904,[207]5.3867,[208]5.3925,[209]5.3957,[210]5.4014,[211]5.4117,[212]5.4178,[213]5.4265,[214]5.4290,[215]5.4325,[216]5.4442,[217]5.4606,[218]5.4739,[219]5.4737,[220]5.4708,[221]5.4662,[222]5.4661,[223]5.4595,[224]5.4525,[225]5.4492,[226]5.4689,[227]5.4743,[228]5.4817,[229]5.4885,[230]5.4849,[231]5.5003,[232]5.4900,[233]5.4753,[234]5.4608,[235]5.4390,[236]5.4339,[237]5.4251,[238]5.4284,[239]5.4171,[240]5.4082,[241]5.4115,[242]5.4126,[243]5.4118,[244]5.4020,[245]5.3983,[246]5.3883,[247]5.3786,[248]5.3726,[249]5.3694,[250]5.3727,[251]5.3645,[252]5.3597,[253]5.3509,[254]5.3468,[255]5.3376,[256]5.3213,[257]5.3114,[258]5.3047,[259]5.3035,[260]5.2952,[261]5.2901,[262]5.2863,[263]5.2813,[264]5.2584,[265]5.2582,[266]5.2553,[267]5.2490,[268]5.2555,[269]5.2550,[270]5.2558,[271]5.2620,[272]5.2649,[273]5.2665,[274]5.2676,[275]5.2736,[276]5.2795,[277]5.2917,[278]5.3000,[279]5.3081,[280]5.3118,[281]5.3217,[282]5.3270,[283]5.3396,[284]5.3483,[285]5.3564,[286]5.3687,[287]5.3654,[288]5.3710,[289]5.3649,[290]5.3511,[291]5.3385,[292]5.3253,[293]5.3133,[294]5.3137,[295]5.3138,[296]5.3186,[297]5.3176,[298]5.3198,[299]5.3175,[300]5.3089,[301]5.3093,[302]5.3030,[303]5.2950,[304]5.2876,[305]5.2848,[306]5.2740,[307]5.2769,[308]5.2776,[309]5.2648,[310]5.2621,[311]5.2579,[312]5.2592,[313]5.2536,[314]5.2522,[315]5.2396,[316]5.2361,[317]5.2236,[318]5.2077,[319]5.2178,[320]5.2289,[321]5.2333,[322]5.2301,[323]5.2244,[324]5.2226,[325]5.2319,[326]5.2336,[327]5.2343,[328]5.2379,[329]5.2426,[330]5.2446,[331]5.2548,[332]5.2511,[333]5.2589,[334]5.2544,[335]5.2495,[336]5.2518,[337]5.2507,[338]5.2503,[339]5.2460,[340]5.2434,[341]5.2501,[342]5.2534,[343]5.2578,[344]5.2583,[345]5.2597,[346]5.2581,[347]5.2620,[348]5.2656,[349]5.2677,[350]5.2659,[351]5.2674,[352]5.2677,[353]5.2628,[354]5.2634,[355]5.2684,[356]5.2713,[357]5.2683,[358]5.2761,[359]5.2783,[360]5.2749,[361]5.2747,[362]5.2813,[363]5.2920,[364]5.2969,[365]5.3006,[366]5.3024,[367]5.3111,[368]5.3090,[369]5.3105,[370]5.3126,[371]5.3086,[372]5.3133,[373]5.3173,[374]5.3155,[375]5.3151,[376]5.3206,[377]5.3171,[378]5.3197,[379]5.3236,[380]5.3168,[381]5.3140,[382]5.3098,[383]5.3080,[384]5.3080,[385]5.3068,[386]5.3057,[387]5.3054,[388]5.3025,[389]5.2988,[390]5.2936,[391]5.2880,[392]5.2844,[393]5.2841,[394]5.2872,[395]5.2864,[396]5.2812,[397]5.2879,[398]5.2922,[399]5.2994,[400]5.2985,[401]5.2992,[402]5.3002,[403]5.3026,[404]5.3081,[405]5.2931,[406]5.2890,[407]5.2878,[408]5.2889,[409]5.3000,[410]5.3092,[411]5.3186,[412]5.3326,[413]5.3428,[414]5.3491,[415]5.3550,[416]5.3624,[417]5.3719,[418]5.3740,[419]5.3788,[420]5.3865,[421]5.3960,[422]5.3994,[423]5.4049,[424]5.4137,[425]5.4214,[426]5.4276,[427]5.4316,[428]5.4389,[429]5.4427,[430]5.4488,[431]5.4614,[432]5.4646,[433]5.4638,[434]5.4604,[435]5.4616,[436]5.4644,[437]5.4727,[438]5.4801,[439]5.4774,[440]5.4765,[441]5.4720,[442]5.4709,[443]5.4721,[444]5.4739,[445]5.4731,[446]5.4750,[447]5.4774,[448]5.4805,[449]5.4789,[450]5.4800,[451]5.4771,[452]5.4617,[453]5.4525,[454]5.4469,[455]5.4473,[456]5.4512,[457]5.4524,[458]5.4506,[459]5.4501,[460]5.4573,[461]5.4533,[462]5.4495,[463]5.4477,[464]5.4473,[465]5.4452,[466]5.4377,[467]5.4366,[468]5.4347,[469]5.4357,[470]5.4346,[471]5.4296,[472]5.4303,[473]5.4257,[474]5.4247,[475]5.4179,[476]5.4152,[477]5.4071,[478]5.4045,[479]5.4049,[480]5.4075,[481]5.4078,[482]5.4032,[483]5.3992,[484]5.4000,[485]5.3932,[486]5.3868,[487]5.3860,[488]5.3837,[489]5.3786,[490]5.3753,[491]5.3719,[492]5.3651,[493]5.3621,[494]5.3605,[495]5.3584,[496]5.3546,[497]5.3485,[498]5.3458,[499]5.3424,[500]5.3344,[501]5.3273,[502]5.3263,[503]5.3252,[504]5.3175,[505]5.3174,[506]5.3180,[507]5.3127,[508]5.3091,[509]5.3096,[510]5.3118,[511]5.3160,[512]5.3199,[513]5.3222,[514]5.3277,[515]5.3237,[516]5.3227,[517]5.3225,[518]5.3226,[519]5.3247,[520]5.3260,[521]5.3270,[522]5.3283,[523]5.3290,[524]5.3345,[525]5.3372,[526]5.3377,[527]5.3393,[528]5.3339,[529]5.3348,[530]5.3311,[531]5.3306,[532]5.3354,[533]5.3382,[534]5.3363,[535]5.3384,[536]5.3341,[537]5.3323,[538]5.3373,[539]5.3381,[540]5.3398,[541]5.3396,[542]5.3409,[543]5.3431,[544]5.3444,[545]5.3433,[546]5.3435,[547]5.3403,[548]5.3362,[549]5.3363,[550]5.3343,[551]5.3318,[552]5.3299,[553]5.3270,[554]5.3247,[555]5.3228,[556]5.3221,[557]5.3237,[558]5.3205,[559]5.3207,[560]5.3194,[561]5.3195,[562]5.3168,[563]5.3166,[564]5.3209,[565]5.3219,[566]5.3225,[567]5.3206,[568]5.3216,[569]5.3201,[570]5.3228,[571]5.3241,[572]5.3251,[573]5.3254,[574]5.3226,[575]5.3207,[576]5.3201,[577]5.3185,[578]5.3166,[579]5.3164,[580]5.3112,[581]5.3082,[582]5.3083,[583]5.3092,[584]5.3098,[585]5.3040,[586]5.2987,[587]5.2987,[588]5.3031,[589]5.3080,[590]5.3109,[591]5.3125,[592]5.3114,[593]5.3075,[594]5.3089,[595]5.3073,[596]5.3114,[597]5.3095,[598]5.3062,[599]5.3088,[600]5.3079,[601]5.3068,[602]5.3067,[603]5.3094,[604]5.3099,[605]5.3124,[606]5.3137,[607]5.3123,[608]5.3095,[609]5.3104,[610]5.3144,[611]5.3130,[612]5.3151,[613]5.3122,[614]5.3083,[615]5.3025,[616]5.3051,[617]5.3002,[618]5.2960,[619]5.2916,[620]5.2808,[621]5.2758,[622]5.2741,[623]5.2754,[624]5.2758,[625]5.2766,[626]5.2763,[627]5.2790,[628]5.2798,[629]5.2802,[630]5.2832,[631]5.2876,[632]5.2923,[633]5.2911,[634]5.2940,[635]5.2936,[636]5.2901,[637]5.2864,[638]5.2885,[639]5.2854,[640]5.2859,[641]5.2863,[642]5.2913,[643]5.2930,[644]5.2947,[645]5.2933,[646]5.2967,[647]5.2916,[648]5.2927,[649]5.2929,[650]5.2959,[651]5.3000,[652]5.3004,[653]5.3042,[654]5.2988,[655]5.2981,
llama_print_timings: load time = 6350.49 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1667989.72 ms / 335360 tokens ( 4.97 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1699934.63 ms