Skip to content

Variable bit rate quantization #1256

Closed
Closed
@ikawrakow

Description

@ikawrakow

Variable bit rate is commonly used in audio and video compression, so why not try on LLMs?

My guess is that a locally adaptive variable bit rate would require a major change to ggml. So, then, the least one can try is to see if using different number of bits in the different network layers would be beneficial.

As a first step, I simply changed llama.cpp to not quantize one of the tensor types in addition to output.weight (which is already known to have a significant impact on generation quality) and calculated perplexity for Q2_4 quantization (see issue #1240). Picked 2-bit quantization because there the difference between a quantized and not quantized tensor will be largest, so it would be easiest to see the effect. The following table summarizes the results (PPL improvement is perplexity with fp16 output.weight - perplexity with fp16 output weight + indicated tensor, table is sorted in decreasing order of impact)

Tensor type PPL improvement
feed_forward.w2 1.0832
attention.wv 0.7819
feed_forward.w3 0.5658
feed_forward.w1 0.3917
attention.wo 0.3902
attention.wq 0.1250
attention.wk 0.1090
tok_embeddings < 0.05

Interesting to note that the tok_embeddings tensor, which has been considered worthy of a dedicated quantization type LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 where it is kept as fp16, has basically no influence on generation quality even when quantized with 2 bits.

Based on these findings, I ran 2- and 3-bit perplexity calculations where the top 2 tensors feed_forward.w2 and attention.wv are quantized using Q5_1 instead of Q2_4 or Q3_4. Here are the results:

Model Quantization file size Perplexity
7B Q2_4 + Q5_1 3.16G 6.8160
7B Q3_4 + Q5_1 3.7G 6.0996
13B Q2_4 + Q5_1 6.1G 5.7880
13B Q3_4 + Q5_1 7.1G 5.3715

Interesting to note that the mixed Q3_4 + Q5_1 quantization has a lower perplexity than any 4-bit quantization listed on the main page for the 7B model despite the smaller quantized model size.

I have not explored only using Q5_1 for a subset of the feed_forward.w2 and attention.wv tensors. The quantization rmse for these two tensor types increases with layer depth, so perhaps it would be sufficient to use a higher bit rate for only the last few layers, thus reducing quantized model size compared to what is given in the above table.

Here are the complete runs for the above table. There is no new quantization type, just a quick hack where I added

            if (tensor.name == "output.weight" ||
                tensor.name.find("attention.wv.weight") != std::string::npos ||
                tensor.name.find("feed_forward.w2.weight") != std::string::npos) {
                new_type = GGML_TYPE_Q5_1;
            }    

just after this line in llama.cpp.

7B, Q2_4 + Q5_1 main: seed = 1682863345 llama.cpp: loading model from ../build/junk.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 15 (mostly Q2_4) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5026.65 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.61 seconds per pass - ETA 17 minutes
[1]4.8689,[2]5.5380,[3]6.3512,[4]7.0241,[5]7.0748,[6]7.0472,[7]7.2764,[8]7.3672,[9]7.7450,[10]8.0356,[11]8.2671,[12]8.3551,[13]8.2873,[14]8.4518,[15]8.7368,[16]8.2773,[17]8.1054,[18]8.0547,[19]7.6308,[20]7.5978,[21]7.4992,[22]7.3066,[23]7.2867,[24]7.1873,[25]7.1983,[26]7.0220,[27]6.8122,[28]6.7195,[29]6.6220,[30]6.4602,[31]6.4277,[32]6.4450,[33]6.3845,[34]6.4217,[35]6.4431,[36]6.4962,[37]6.4978,[38]6.5093,[39]6.5542,[40]6.6065,[41]6.6159,[42]6.6562,[43]6.6069,[44]6.6684,[45]6.6746,[46]6.6483,[47]6.6718,[48]6.6386,[49]6.6381,[50]6.5934,[51]6.5892,[52]6.5701,[53]6.6233,[54]6.6083,[55]6.5750,[56]6.6126,[57]6.6321,[58]6.6554,[59]6.6693,[60]6.7166,[61]6.7055,[62]6.7665,[63]6.8012,[64]6.8124,[65]6.8597,[66]6.8646,[67]6.8819,[68]6.8975,[69]6.9258,[70]6.9585,[71]6.9829,[72]7.0148,[73]7.0743,[74]7.0740,[75]7.0859,[76]7.0966,[77]7.1070,[78]7.0913,[79]7.1232,[80]7.1149,[81]7.1345,[82]7.1442,[83]7.0841,[84]7.0665,[85]7.0551,[86]7.0270,[87]6.9683,[88]6.9379,[89]6.9188,[90]6.9039,[91]6.9317,[92]6.9235,[93]6.9269,[94]6.9251,[95]6.9596,[96]6.9609,[97]6.9612,[98]6.9529,[99]6.9338,[100]6.9302,[101]6.9593,[102]6.9523,[103]6.9737,[104]6.9851,[105]6.9858,[106]7.0021,[107]6.9985,[108]7.0140,[109]7.0090,[110]7.0046,[111]7.0250,[112]7.0456,[113]7.0524,[114]7.0490,[115]7.0560,[116]7.0459,[117]7.0517,[118]7.0811,[119]7.1028,[120]7.1413,[121]7.1622,[122]7.1874,[123]7.2290,[124]7.2475,[125]7.2360,[126]7.2746,[127]7.3097,[128]7.3423,[129]7.3207,[130]7.3302,[131]7.3224,[132]7.3129,[133]7.3003,[134]7.3145,[135]7.3089,[136]7.2968,[137]7.2890,[138]7.2756,[139]7.2651,[140]7.2619,[141]7.2379,[142]7.2336,[143]7.2057,[144]7.1847,[145]7.1773,[146]7.1654,[147]7.1718,[148]7.1704,[149]7.1673,[150]7.1636,[151]7.1664,[152]7.1540,[153]7.1358,[154]7.1244,[155]7.1304,[156]7.1259,[157]7.1421,[158]7.1439,[159]7.1515,[160]7.1542,[161]7.1698,[162]7.1374,[163]7.1246,[164]7.0971,[165]7.0609,[166]7.0288,[167]6.9864,[168]6.9518,[169]6.9381,[170]6.9242,[171]6.8931,[172]6.8715,[173]6.8532,[174]6.8210,[175]6.7957,[176]6.7799,[177]6.7599,[178]6.7353,[179]6.7163,[180]6.7042,[181]6.6793,[182]6.6581,[183]6.6411,[184]6.6389,[185]6.6304,[186]6.6321,[187]6.6376,[188]6.6348,[189]6.6532,[190]6.6555,[191]6.6786,[192]6.6938,[193]6.7137,[194]6.7257,[195]6.7503,[196]6.7668,[197]6.7887,[198]6.8067,[199]6.8101,[200]6.8141,[201]6.8096,[202]6.8342,[203]6.8424,[204]6.8470,[205]6.8581,[206]6.8657,[207]6.8619,[208]6.8726,[209]6.8780,[210]6.8816,[211]6.8934,[212]6.9011,[213]6.9121,[214]6.9185,[215]6.9231,[216]6.9367,[217]6.9563,[218]6.9708,[219]6.9702,[220]6.9650,[221]6.9601,[222]6.9569,[223]6.9457,[224]6.9389,[225]6.9346,[226]6.9554,[227]6.9661,[228]6.9727,[229]6.9772,[230]6.9739,[231]6.9931,[232]6.9831,[233]6.9635,[234]6.9467,[235]6.9311,[236]6.9226,[237]6.9113,[238]6.9156,[239]6.8994,[240]6.8880,[241]6.8921,[242]6.8965,[243]6.8932,[244]6.8800,[245]6.8776,[246]6.8658,[247]6.8522,[248]6.8445,[249]6.8409,[250]6.8471,[251]6.8399,[252]6.8348,[253]6.8244,[254]6.8182,[255]6.8047,[256]6.7833,[257]6.7706,[258]6.7620,[259]6.7591,[260]6.7503,[261]6.7464,[262]6.7397,[263]6.7344,[264]6.7181,[265]6.7174,[266]6.7143,[267]6.7061,[268]6.7162,[269]6.7142,[270]6.7137,[271]6.7213,[272]6.7260,[273]6.7251,[274]6.7284,[275]6.7389,[276]6.7449,[277]6.7640,[278]6.7748,[279]6.7843,[280]6.7867,[281]6.7962,[282]6.8019,[283]6.8184,[284]6.8264,[285]6.8361,[286]6.8503,[287]6.8506,[288]6.8577,[289]6.8476,[290]6.8306,[291]6.8143,[292]6.7983,[293]6.7837,[294]6.7858,[295]6.7836,[296]6.7889,[297]6.7877,[298]6.7916,[299]6.7881,[300]6.7765,[301]6.7751,[302]6.7663,[303]6.7569,[304]6.7478,[305]6.7439,[306]6.7292,[307]6.7309,[308]6.7356,[309]6.7174,[310]6.7112,[311]6.7044,[312]6.7065,[313]6.6999,[314]6.6979,[315]6.6804,[316]6.6768,[317]6.6583,[318]6.6363,[319]6.6513,[320]6.6646,[321]6.6703,[322]6.6661,[323]6.6598,[324]6.6586,[325]6.6711,[326]6.6704,[327]6.6734,[328]6.6757,[329]6.6834,[330]6.6872,[331]6.7003,[332]6.6962,[333]6.7041,[334]6.6979,[335]6.6901,[336]6.6922,[337]6.6888,[338]6.6896,[339]6.6829,[340]6.6784,[341]6.6871,[342]6.6901,[343]6.6962,[344]6.6964,[345]6.6953,[346]6.6908,[347]6.6946,[348]6.6986,[349]6.6998,[350]6.6957,[351]6.6962,[352]6.6968,[353]6.6902,[354]6.6924,[355]6.6989,[356]6.7021,[357]6.6984,[358]6.7090,[359]6.7119,[360]6.7066,[361]6.7052,[362]6.7144,[363]6.7251,[364]6.7318,[365]6.7372,[366]6.7380,[367]6.7469,[368]6.7429,[369]6.7430,[370]6.7444,[371]6.7381,[372]6.7424,[373]6.7475,[374]6.7446,[375]6.7432,[376]6.7509,[377]6.7460,[378]6.7483,[379]6.7546,[380]6.7455,[381]6.7411,[382]6.7370,[383]6.7360,[384]6.7360,[385]6.7342,[386]6.7340,[387]6.7332,[388]6.7288,[389]6.7223,[390]6.7152,[391]6.7065,[392]6.7026,[393]6.7020,[394]6.7049,[395]6.7036,[396]6.6953,[397]6.7020,[398]6.7058,[399]6.7139,[400]6.7138,[401]6.7148,[402]6.7157,[403]6.7171,[404]6.7237,[405]6.7152,[406]6.7121,[407]6.7122,[408]6.7134,[409]6.7250,[410]6.7371,[411]6.7497,[412]6.7668,[413]6.7790,[414]6.7873,[415]6.7931,[416]6.8026,[417]6.8166,[418]6.8216,[419]6.8294,[420]6.8394,[421]6.8512,[422]6.8556,[423]6.8647,[424]6.8765,[425]6.8857,[426]6.8922,[427]6.8970,[428]6.9061,[429]6.9121,[430]6.9217,[431]6.9381,[432]6.9418,[433]6.9406,[434]6.9354,[435]6.9354,[436]6.9371,[437]6.9470,[438]6.9553,[439]6.9516,[440]6.9503,[441]6.9448,[442]6.9418,[443]6.9432,[444]6.9441,[445]6.9416,[446]6.9440,[447]6.9471,[448]6.9513,[449]6.9487,[450]6.9482,[451]6.9434,[452]6.9351,[453]6.9262,[454]6.9219,[455]6.9226,[456]6.9276,[457]6.9302,[458]6.9276,[459]6.9285,[460]6.9373,[461]6.9347,[462]6.9338,[463]6.9396,[464]6.9390,[465]6.9360,[466]6.9289,[467]6.9300,[468]6.9310,[469]6.9335,[470]6.9342,[471]6.9290,[472]6.9336,[473]6.9272,[474]6.9292,[475]6.9246,[476]6.9274,[477]6.9200,[478]6.9199,[479]6.9293,[480]6.9351,[481]6.9373,[482]6.9325,[483]6.9273,[484]6.9295,[485]6.9287,[486]6.9227,[487]6.9233,[488]6.9219,[489]6.9161,[490]6.9132,[491]6.9109,[492]6.9048,[493]6.9019,[494]6.9008,[495]6.9013,[496]6.8970,[497]6.8923,[498]6.8908,[499]6.8846,[500]6.8749,[501]6.8686,[502]6.8689,[503]6.8673,[504]6.8577,[505]6.8599,[506]6.8605,[507]6.8559,[508]6.8514,[509]6.8506,[510]6.8548,[511]6.8601,[512]6.8632,[513]6.8650,[514]6.8722,[515]6.8664,[516]6.8648,[517]6.8661,[518]6.8655,[519]6.8687,[520]6.8713,[521]6.8724,[522]6.8757,[523]6.8762,[524]6.8826,[525]6.8860,[526]6.8879,[527]6.8899,[528]6.8851,[529]6.8861,[530]6.8799,[531]6.8775,[532]6.8826,[533]6.8846,[534]6.8821,[535]6.8854,[536]6.8789,[537]6.8761,[538]6.8816,[539]6.8823,[540]6.8880,[541]6.8893,[542]6.8899,[543]6.8916,[544]6.8923,[545]6.8906,[546]6.8919,[547]6.8868,[548]6.8803,[549]6.8806,[550]6.8770,[551]6.8732,[552]6.8708,[553]6.8665,[554]6.8638,[555]6.8593,[556]6.8589,[557]6.8627,[558]6.8587,[559]6.8583,[560]6.8578,[561]6.8580,[562]6.8555,[563]6.8559,[564]6.8605,[565]6.8629,[566]6.8624,[567]6.8599,[568]6.8593,[569]6.8572,[570]6.8601,[571]6.8603,[572]6.8609,[573]6.8598,[574]6.8565,[575]6.8570,[576]6.8572,[577]6.8550,[578]6.8537,[579]6.8547,[580]6.8475,[581]6.8430,[582]6.8412,[583]6.8415,[584]6.8412,[585]6.8333,[586]6.8264,[587]6.8268,[588]6.8321,[589]6.8388,[590]6.8420,[591]6.8438,[592]6.8424,[593]6.8377,[594]6.8379,[595]6.8352,[596]6.8387,[597]6.8353,[598]6.8323,[599]6.8340,[600]6.8332,[601]6.8313,[602]6.8341,[603]6.8370,[604]6.8381,[605]6.8409,[606]6.8429,[607]6.8417,[608]6.8373,[609]6.8375,[610]6.8410,[611]6.8390,[612]6.8421,[613]6.8380,[614]6.8332,[615]6.8241,[616]6.8281,[617]6.8211,[618]6.8153,[619]6.8084,[620]6.7923,[621]6.7843,[622]6.7822,[623]6.7841,[624]6.7839,[625]6.7838,[626]6.7822,[627]6.7851,[628]6.7848,[629]6.7844,[630]6.7879,[631]6.7937,[632]6.7994,[633]6.7975,[634]6.8018,[635]6.8029,[636]6.8007,[637]6.7977,[638]6.8010,[639]6.7979,[640]6.7991,[641]6.7990,[642]6.8064,[643]6.8086,[644]6.8094,[645]6.8069,[646]6.8121,[647]6.8090,[648]6.8099,[649]6.8094,[650]6.8144,[651]6.8197,[652]6.8211,[653]6.8252,[654]6.8175,[655]6.8160,

llama_print_timings: load time = 2658.59 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 976009.34 ms / 335360 tokens ( 2.91 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1004906.69 ms

7B, Q3_4 + Q5_1 main: seed = 1682864367 llama.cpp: loading model from ../build/junk1.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 10 (mostly Q3_4) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 59.11 KB llama_model_load_internal: mem required = 5578.28 MB (+ 1026.00 MB per state) llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
1.69 seconds per pass - ETA 18 minutes
[1]4.3772,[2]4.7902,[3]5.6644,[4]6.2926,[5]6.4094,[6]6.3682,[7]6.5670,[8]6.6689,[9]7.0093,[10]7.2537,[11]7.4667,[12]7.5133,[13]7.4469,[14]7.5018,[15]7.7581,[16]7.3727,[17]7.2507,[18]7.1917,[19]6.8302,[20]6.8186,[21]6.7251,[22]6.5571,[23]6.5252,[24]6.4336,[25]6.4301,[26]6.2650,[27]6.0943,[28]5.9983,[29]5.9072,[30]5.7545,[31]5.7284,[32]5.7503,[33]5.6977,[34]5.7261,[35]5.7495,[36]5.7812,[37]5.7814,[38]5.7930,[39]5.8255,[40]5.8782,[41]5.8900,[42]5.9274,[43]5.8861,[44]5.9426,[45]5.9455,[46]5.9212,[47]5.9424,[48]5.9169,[49]5.9180,[50]5.8766,[51]5.8735,[52]5.8628,[53]5.9067,[54]5.8907,[55]5.8671,[56]5.8970,[57]5.9166,[58]5.9371,[59]5.9534,[60]5.9951,[61]5.9885,[62]6.0481,[63]6.0787,[64]6.0918,[65]6.1345,[66]6.1431,[67]6.1623,[68]6.1757,[69]6.1978,[70]6.2297,[71]6.2521,[72]6.2830,[73]6.3437,[74]6.3470,[75]6.3602,[76]6.3750,[77]6.3875,[78]6.3715,[79]6.4006,[80]6.3955,[81]6.4061,[82]6.4121,[83]6.3619,[84]6.3428,[85]6.3290,[86]6.3068,[87]6.2470,[88]6.2224,[89]6.2023,[90]6.1887,[91]6.2120,[92]6.2071,[93]6.2058,[94]6.2024,[95]6.2297,[96]6.2300,[97]6.2245,[98]6.2201,[99]6.2056,[100]6.2044,[101]6.2303,[102]6.2271,[103]6.2465,[104]6.2540,[105]6.2524,[106]6.2693,[107]6.2680,[108]6.2817,[109]6.2775,[110]6.2736,[111]6.2939,[112]6.3155,[113]6.3180,[114]6.3130,[115]6.3189,[116]6.3090,[117]6.3133,[118]6.3412,[119]6.3632,[120]6.3973,[121]6.4124,[122]6.4367,[123]6.4732,[124]6.4919,[125]6.4823,[126]6.5207,[127]6.5572,[128]6.5871,[129]6.5713,[130]6.5796,[131]6.5760,[132]6.5691,[133]6.5555,[134]6.5644,[135]6.5596,[136]6.5491,[137]6.5408,[138]6.5234,[139]6.5126,[140]6.5101,[141]6.4830,[142]6.4794,[143]6.4498,[144]6.4283,[145]6.4197,[146]6.4070,[147]6.4104,[148]6.4101,[149]6.4047,[150]6.4007,[151]6.4037,[152]6.3940,[153]6.3789,[154]6.3707,[155]6.3774,[156]6.3727,[157]6.3896,[158]6.3943,[159]6.3981,[160]6.4011,[161]6.4143,[162]6.3863,[163]6.3744,[164]6.3508,[165]6.3203,[166]6.2935,[167]6.2558,[168]6.2253,[169]6.2117,[170]6.2011,[171]6.1750,[172]6.1585,[173]6.1421,[174]6.1119,[175]6.0898,[176]6.0779,[177]6.0581,[178]6.0349,[179]6.0179,[180]6.0087,[181]5.9869,[182]5.9695,[183]5.9546,[184]5.9524,[185]5.9448,[186]5.9454,[187]5.9516,[188]5.9485,[189]5.9654,[190]5.9665,[191]5.9878,[192]6.0035,[193]6.0195,[194]6.0306,[195]6.0520,[196]6.0674,[197]6.0883,[198]6.1032,[199]6.1060,[200]6.1102,[201]6.1052,[202]6.1251,[203]6.1329,[204]6.1330,[205]6.1432,[206]6.1498,[207]6.1461,[208]6.1546,[209]6.1593,[210]6.1638,[211]6.1749,[212]6.1826,[213]6.1933,[214]6.1958,[215]6.1980,[216]6.2120,[217]6.2295,[218]6.2431,[219]6.2435,[220]6.2400,[221]6.2343,[222]6.2324,[223]6.2227,[224]6.2156,[225]6.2119,[226]6.2319,[227]6.2393,[228]6.2448,[229]6.2501,[230]6.2464,[231]6.2637,[232]6.2520,[233]6.2350,[234]6.2204,[235]6.2017,[236]6.1944,[237]6.1850,[238]6.1878,[239]6.1730,[240]6.1623,[241]6.1645,[242]6.1681,[243]6.1662,[244]6.1551,[245]6.1519,[246]6.1415,[247]6.1301,[248]6.1230,[249]6.1208,[250]6.1253,[251]6.1191,[252]6.1150,[253]6.1047,[254]6.0999,[255]6.0884,[256]6.0706,[257]6.0587,[258]6.0507,[259]6.0489,[260]6.0406,[261]6.0367,[262]6.0311,[263]6.0259,[264]6.0068,[265]6.0064,[266]6.0045,[267]5.9985,[268]6.0072,[269]6.0053,[270]6.0054,[271]6.0129,[272]6.0167,[273]6.0167,[274]6.0187,[275]6.0269,[276]6.0326,[277]6.0485,[278]6.0587,[279]6.0675,[280]6.0697,[281]6.0794,[282]6.0850,[283]6.0996,[284]6.1077,[285]6.1164,[286]6.1293,[287]6.1286,[288]6.1348,[289]6.1263,[290]6.1109,[291]6.0960,[292]6.0814,[293]6.0681,[294]6.0702,[295]6.0689,[296]6.0739,[297]6.0731,[298]6.0757,[299]6.0734,[300]6.0626,[301]6.0623,[302]6.0550,[303]6.0461,[304]6.0376,[305]6.0340,[306]6.0213,[307]6.0235,[308]6.0259,[309]6.0104,[310]6.0047,[311]5.9983,[312]6.0006,[313]5.9954,[314]5.9937,[315]5.9781,[316]5.9735,[317]5.9570,[318]5.9369,[319]5.9493,[320]5.9613,[321]5.9657,[322]5.9617,[323]5.9553,[324]5.9524,[325]5.9627,[326]5.9630,[327]5.9652,[328]5.9687,[329]5.9742,[330]5.9772,[331]5.9890,[332]5.9866,[333]5.9933,[334]5.9878,[335]5.9821,[336]5.9857,[337]5.9832,[338]5.9830,[339]5.9774,[340]5.9734,[341]5.9818,[342]5.9841,[343]5.9892,[344]5.9893,[345]5.9891,[346]5.9866,[347]5.9903,[348]5.9941,[349]5.9966,[350]5.9933,[351]5.9943,[352]5.9944,[353]5.9882,[354]5.9888,[355]5.9939,[356]5.9971,[357]5.9933,[358]6.0026,[359]6.0051,[360]6.0016,[361]6.0011,[362]6.0076,[363]6.0186,[364]6.0244,[365]6.0288,[366]6.0300,[367]6.0383,[368]6.0357,[369]6.0366,[370]6.0381,[371]6.0325,[372]6.0374,[373]6.0423,[374]6.0407,[375]6.0407,[376]6.0479,[377]6.0431,[378]6.0457,[379]6.0517,[380]6.0436,[381]6.0401,[382]6.0350,[383]6.0341,[384]6.0336,[385]6.0324,[386]6.0319,[387]6.0321,[388]6.0285,[389]6.0233,[390]6.0163,[391]6.0085,[392]6.0043,[393]6.0027,[394]6.0054,[395]6.0045,[396]5.9971,[397]6.0038,[398]6.0070,[399]6.0144,[400]6.0143,[401]6.0162,[402]6.0175,[403]6.0195,[404]6.0260,[405]6.0171,[406]6.0138,[407]6.0137,[408]6.0151,[409]6.0260,[410]6.0373,[411]6.0488,[412]6.0648,[413]6.0758,[414]6.0837,[415]6.0893,[416]6.0976,[417]6.1097,[418]6.1132,[419]6.1196,[420]6.1286,[421]6.1404,[422]6.1437,[423]6.1507,[424]6.1612,[425]6.1700,[426]6.1762,[427]6.1805,[428]6.1889,[429]6.1939,[430]6.2021,[431]6.2158,[432]6.2192,[433]6.2186,[434]6.2143,[435]6.2147,[436]6.2173,[437]6.2269,[438]6.2344,[439]6.2311,[440]6.2303,[441]6.2254,[442]6.2239,[443]6.2248,[444]6.2251,[445]6.2235,[446]6.2256,[447]6.2287,[448]6.2331,[449]6.2309,[450]6.2317,[451]6.2277,[452]6.2160,[453]6.2077,[454]6.2022,[455]6.2032,[456]6.2077,[457]6.2097,[458]6.2074,[459]6.2078,[460]6.2164,[461]6.2138,[462]6.2124,[463]6.2159,[464]6.2147,[465]6.2119,[466]6.2042,[467]6.2045,[468]6.2042,[469]6.2063,[470]6.2065,[471]6.2018,[472]6.2062,[473]6.2008,[474]6.2022,[475]6.1968,[476]6.1979,[477]6.1909,[478]6.1900,[479]6.1963,[480]6.2009,[481]6.2029,[482]6.1985,[483]6.1947,[484]6.1965,[485]6.1950,[486]6.1894,[487]6.1893,[488]6.1870,[489]6.1822,[490]6.1798,[491]6.1769,[492]6.1713,[493]6.1685,[494]6.1666,[495]6.1657,[496]6.1621,[497]6.1566,[498]6.1552,[499]6.1510,[500]6.1418,[501]6.1353,[502]6.1357,[503]6.1348,[504]6.1260,[505]6.1281,[506]6.1291,[507]6.1237,[508]6.1196,[509]6.1191,[510]6.1228,[511]6.1273,[512]6.1307,[513]6.1326,[514]6.1387,[515]6.1333,[516]6.1324,[517]6.1335,[518]6.1329,[519]6.1359,[520]6.1383,[521]6.1395,[522]6.1422,[523]6.1429,[524]6.1484,[525]6.1514,[526]6.1522,[527]6.1539,[528]6.1487,[529]6.1497,[530]6.1448,[531]6.1435,[532]6.1480,[533]6.1501,[534]6.1479,[535]6.1502,[536]6.1447,[537]6.1427,[538]6.1480,[539]6.1493,[540]6.1532,[541]6.1536,[542]6.1547,[543]6.1562,[544]6.1572,[545]6.1552,[546]6.1558,[547]6.1516,[548]6.1467,[549]6.1467,[550]6.1435,[551]6.1400,[552]6.1379,[553]6.1341,[554]6.1320,[555]6.1288,[556]6.1284,[557]6.1311,[558]6.1276,[559]6.1275,[560]6.1274,[561]6.1276,[562]6.1254,[563]6.1250,[564]6.1294,[565]6.1314,[566]6.1311,[567]6.1289,[568]6.1295,[569]6.1282,[570]6.1314,[571]6.1318,[572]6.1329,[573]6.1328,[574]6.1292,[575]6.1285,[576]6.1285,[577]6.1270,[578]6.1253,[579]6.1258,[580]6.1193,[581]6.1157,[582]6.1148,[583]6.1157,[584]6.1159,[585]6.1085,[586]6.1019,[587]6.1027,[588]6.1075,[589]6.1127,[590]6.1157,[591]6.1177,[592]6.1163,[593]6.1129,[594]6.1139,[595]6.1116,[596]6.1149,[597]6.1126,[598]6.1096,[599]6.1119,[600]6.1112,[601]6.1097,[602]6.1113,[603]6.1138,[604]6.1148,[605]6.1181,[606]6.1204,[607]6.1190,[608]6.1156,[609]6.1163,[610]6.1198,[611]6.1182,[612]6.1210,[613]6.1173,[614]6.1121,[615]6.1047,[616]6.1072,[617]6.1011,[618]6.0962,[619]6.0906,[620]6.0769,[621]6.0702,[622]6.0685,[623]6.0700,[624]6.0706,[625]6.0708,[626]6.0697,[627]6.0722,[628]6.0721,[629]6.0717,[630]6.0749,[631]6.0804,[632]6.0860,[633]6.0846,[634]6.0879,[635]6.0883,[636]6.0857,[637]6.0821,[638]6.0845,[639]6.0814,[640]6.0823,[641]6.0823,[642]6.0888,[643]6.0912,[644]6.0926,[645]6.0911,[646]6.0953,[647]6.0917,[648]6.0926,[649]6.0928,[650]6.0968,[651]6.1019,[652]6.1030,[653]6.1068,[654]6.1003,[655]6.0996,

llama_print_timings: load time = 2722.23 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1024717.17 ms / 335360 tokens ( 3.06 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1052144.49 ms

13B, Q2_4 + Q5_1 main: seed = 1682869318 llama.cpp: loading model from ../build/junk3.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 15 (mostly Q2_4) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 73.73 KB llama_model_load_internal: mem required = 8284.13 MB (+ 1608.00 MB per state) llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.66 seconds per pass - ETA 29 minutes
[1]3.9721,[2]4.4565,[3]5.2597,[4]5.9033,[5]6.0338,[6]5.9622,[7]6.1601,[8]6.2656,[9]6.5236,[10]6.7829,[11]7.0013,[12]7.0990,[13]7.0476,[14]7.1859,[15]7.4172,[16]7.0334,[17]6.9135,[18]6.8786,[19]6.5400,[20]6.5070,[21]6.4154,[22]6.2361,[23]6.1969,[24]6.0978,[25]6.1050,[26]5.9363,[27]5.7476,[28]5.6483,[29]5.5673,[30]5.4279,[31]5.3936,[32]5.4050,[33]5.3676,[34]5.4138,[35]5.4320,[36]5.4581,[37]5.4515,[38]5.4411,[39]5.4719,[40]5.5259,[41]5.5538,[42]5.5976,[43]5.5570,[44]5.6013,[45]5.6034,[46]5.5709,[47]5.5914,[48]5.5665,[49]5.5732,[50]5.5425,[51]5.5525,[52]5.5445,[53]5.5919,[54]5.5826,[55]5.5603,[56]5.5833,[57]5.6051,[58]5.6318,[59]5.6509,[60]5.6875,[61]5.6795,[62]5.7390,[63]5.7665,[64]5.7749,[65]5.8128,[66]5.8121,[67]5.8288,[68]5.8395,[69]5.8698,[70]5.9008,[71]5.9260,[72]5.9637,[73]6.0136,[74]6.0186,[75]6.0287,[76]6.0452,[77]6.0630,[78]6.0533,[79]6.0811,[80]6.0773,[81]6.0940,[82]6.0916,[83]6.0430,[84]6.0330,[85]6.0263,[86]6.0075,[87]5.9484,[88]5.9136,[89]5.8907,[90]5.8817,[91]5.9066,[92]5.9045,[93]5.9073,[94]5.9044,[95]5.9318,[96]5.9287,[97]5.9266,[98]5.9230,[99]5.9158,[100]5.9112,[101]5.9382,[102]5.9320,[103]5.9477,[104]5.9492,[105]5.9504,[106]5.9664,[107]5.9629,[108]5.9788,[109]5.9759,[110]5.9701,[111]5.9873,[112]6.0048,[113]6.0053,[114]6.0034,[115]6.0066,[116]5.9956,[117]5.9974,[118]6.0221,[119]6.0409,[120]6.0689,[121]6.0859,[122]6.1071,[123]6.1462,[124]6.1624,[125]6.1543,[126]6.1905,[127]6.2246,[128]6.2549,[129]6.2404,[130]6.2488,[131]6.2446,[132]6.2396,[133]6.2282,[134]6.2393,[135]6.2381,[136]6.2292,[137]6.2257,[138]6.2107,[139]6.2011,[140]6.2007,[141]6.1759,[142]6.1720,[143]6.1485,[144]6.1310,[145]6.1232,[146]6.1096,[147]6.1134,[148]6.1171,[149]6.1134,[150]6.1137,[151]6.1188,[152]6.1085,[153]6.0971,[154]6.0912,[155]6.0971,[156]6.0959,[157]6.1130,[158]6.1163,[159]6.1179,[160]6.1222,[161]6.1343,[162]6.1058,[163]6.0948,[164]6.0712,[165]6.0433,[166]6.0157,[167]5.9800,[168]5.9506,[169]5.9372,[170]5.9281,[171]5.9059,[172]5.8906,[173]5.8760,[174]5.8466,[175]5.8240,[176]5.8097,[177]5.7904,[178]5.7676,[179]5.7527,[180]5.7448,[181]5.7262,[182]5.7087,[183]5.6951,[184]5.6934,[185]5.6880,[186]5.6896,[187]5.6955,[188]5.6950,[189]5.7125,[190]5.7137,[191]5.7318,[192]5.7454,[193]5.7611,[194]5.7725,[195]5.7933,[196]5.8061,[197]5.8252,[198]5.8379,[199]5.8412,[200]5.8429,[201]5.8373,[202]5.8553,[203]5.8624,[204]5.8614,[205]5.8712,[206]5.8764,[207]5.8730,[208]5.8796,[209]5.8831,[210]5.8888,[211]5.8998,[212]5.9059,[213]5.9151,[214]5.9187,[215]5.9211,[216]5.9327,[217]5.9480,[218]5.9628,[219]5.9616,[220]5.9575,[221]5.9532,[222]5.9522,[223]5.9451,[224]5.9380,[225]5.9342,[226]5.9546,[227]5.9632,[228]5.9704,[229]5.9768,[230]5.9741,[231]5.9897,[232]5.9793,[233]5.9619,[234]5.9460,[235]5.9269,[236]5.9199,[237]5.9101,[238]5.9136,[239]5.9005,[240]5.8905,[241]5.8933,[242]5.8951,[243]5.8940,[244]5.8841,[245]5.8807,[246]5.8702,[247]5.8600,[248]5.8524,[249]5.8490,[250]5.8530,[251]5.8444,[252]5.8398,[253]5.8301,[254]5.8258,[255]5.8145,[256]5.7966,[257]5.7849,[258]5.7773,[259]5.7751,[260]5.7663,[261]5.7617,[262]5.7569,[263]5.7506,[264]5.7336,[265]5.7324,[266]5.7288,[267]5.7218,[268]5.7296,[269]5.7292,[270]5.7288,[271]5.7352,[272]5.7392,[273]5.7404,[274]5.7412,[275]5.7480,[276]5.7552,[277]5.7693,[278]5.7775,[279]5.7851,[280]5.7886,[281]5.7987,[282]5.8037,[283]5.8164,[284]5.8253,[285]5.8337,[286]5.8472,[287]5.8436,[288]5.8492,[289]5.8424,[290]5.8276,[291]5.8144,[292]5.8000,[293]5.7865,[294]5.7876,[295]5.7878,[296]5.7926,[297]5.7915,[298]5.7945,[299]5.7927,[300]5.7833,[301]5.7830,[302]5.7765,[303]5.7683,[304]5.7593,[305]5.7559,[306]5.7445,[307]5.7467,[308]5.7477,[309]5.7330,[310]5.7289,[311]5.7237,[312]5.7254,[313]5.7193,[314]5.7175,[315]5.7028,[316]5.7002,[317]5.6864,[318]5.6684,[319]5.6795,[320]5.6913,[321]5.6957,[322]5.6918,[323]5.6866,[324]5.6844,[325]5.6958,[326]5.6973,[327]5.6980,[328]5.7006,[329]5.7052,[330]5.7076,[331]5.7186,[332]5.7145,[333]5.7218,[334]5.7162,[335]5.7112,[336]5.7145,[337]5.7124,[338]5.7122,[339]5.7075,[340]5.7049,[341]5.7114,[342]5.7142,[343]5.7184,[344]5.7186,[345]5.7190,[346]5.7163,[347]5.7199,[348]5.7239,[349]5.7266,[350]5.7247,[351]5.7257,[352]5.7258,[353]5.7197,[354]5.7213,[355]5.7262,[356]5.7298,[357]5.7267,[358]5.7352,[359]5.7368,[360]5.7326,[361]5.7319,[362]5.7393,[363]5.7502,[364]5.7557,[365]5.7603,[366]5.7622,[367]5.7718,[368]5.7690,[369]5.7704,[370]5.7726,[371]5.7684,[372]5.7738,[373]5.7777,[374]5.7762,[375]5.7754,[376]5.7821,[377]5.7776,[378]5.7799,[379]5.7845,[380]5.7770,[381]5.7744,[382]5.7703,[383]5.7691,[384]5.7695,[385]5.7677,[386]5.7658,[387]5.7652,[388]5.7614,[389]5.7571,[390]5.7516,[391]5.7449,[392]5.7416,[393]5.7411,[394]5.7436,[395]5.7422,[396]5.7362,[397]5.7439,[398]5.7481,[399]5.7549,[400]5.7535,[401]5.7535,[402]5.7548,[403]5.7576,[404]5.7632,[405]5.7513,[406]5.7473,[407]5.7472,[408]5.7478,[409]5.7592,[410]5.7692,[411]5.7790,[412]5.7942,[413]5.8061,[414]5.8126,[415]5.8184,[416]5.8260,[417]5.8361,[418]5.8395,[419]5.8448,[420]5.8530,[421]5.8638,[422]5.8671,[423]5.8735,[424]5.8834,[425]5.8915,[426]5.8979,[427]5.9019,[428]5.9094,[429]5.9138,[430]5.9201,[431]5.9333,[432]5.9357,[433]5.9345,[434]5.9307,[435]5.9313,[436]5.9338,[437]5.9427,[438]5.9507,[439]5.9471,[440]5.9469,[441]5.9426,[442]5.9404,[443]5.9406,[444]5.9423,[445]5.9409,[446]5.9426,[447]5.9445,[448]5.9481,[449]5.9461,[450]5.9467,[451]5.9429,[452]5.9315,[453]5.9228,[454]5.9169,[455]5.9165,[456]5.9207,[457]5.9221,[458]5.9201,[459]5.9199,[460]5.9272,[461]5.9232,[462]5.9207,[463]5.9215,[464]5.9202,[465]5.9184,[466]5.9109,[467]5.9116,[468]5.9107,[469]5.9121,[470]5.9116,[471]5.9063,[472]5.9090,[473]5.9043,[474]5.9051,[475]5.8991,[476]5.8992,[477]5.8916,[478]5.8895,[479]5.8934,[480]5.8974,[481]5.8984,[482]5.8939,[483]5.8903,[484]5.8919,[485]5.8881,[486]5.8823,[487]5.8820,[488]5.8798,[489]5.8753,[490]5.8725,[491]5.8696,[492]5.8638,[493]5.8606,[494]5.8583,[495]5.8567,[496]5.8530,[497]5.8476,[498]5.8459,[499]5.8416,[500]5.8336,[501]5.8267,[502]5.8268,[503]5.8254,[504]5.8169,[505]5.8170,[506]5.8175,[507]5.8126,[508]5.8087,[509]5.8087,[510]5.8112,[511]5.8161,[512]5.8199,[513]5.8225,[514]5.8285,[515]5.8238,[516]5.8229,[517]5.8235,[518]5.8235,[519]5.8256,[520]5.8277,[521]5.8293,[522]5.8309,[523]5.8315,[524]5.8376,[525]5.8404,[526]5.8412,[527]5.8427,[528]5.8372,[529]5.8381,[530]5.8332,[531]5.8322,[532]5.8378,[533]5.8404,[534]5.8386,[535]5.8415,[536]5.8367,[537]5.8347,[538]5.8397,[539]5.8405,[540]5.8437,[541]5.8446,[542]5.8459,[543]5.8479,[544]5.8487,[545]5.8477,[546]5.8480,[547]5.8440,[548]5.8391,[549]5.8387,[550]5.8373,[551]5.8341,[552]5.8323,[553]5.8286,[554]5.8264,[555]5.8236,[556]5.8227,[557]5.8249,[558]5.8214,[559]5.8214,[560]5.8201,[561]5.8207,[562]5.8182,[563]5.8181,[564]5.8226,[565]5.8238,[566]5.8239,[567]5.8217,[568]5.8222,[569]5.8203,[570]5.8232,[571]5.8238,[572]5.8245,[573]5.8244,[574]5.8218,[575]5.8204,[576]5.8199,[577]5.8177,[578]5.8157,[579]5.8151,[580]5.8094,[581]5.8061,[582]5.8056,[583]5.8061,[584]5.8063,[585]5.8000,[586]5.7940,[587]5.7944,[588]5.7986,[589]5.8039,[590]5.8066,[591]5.8080,[592]5.8069,[593]5.8037,[594]5.8046,[595]5.8026,[596]5.8064,[597]5.8039,[598]5.8005,[599]5.8030,[600]5.8020,[601]5.8005,[602]5.8020,[603]5.8043,[604]5.8051,[605]5.8076,[606]5.8089,[607]5.8077,[608]5.8045,[609]5.8050,[610]5.8095,[611]5.8080,[612]5.8100,[613]5.8066,[614]5.8025,[615]5.7953,[616]5.7980,[617]5.7922,[618]5.7870,[619]5.7819,[620]5.7695,[621]5.7639,[622]5.7617,[623]5.7633,[624]5.7634,[625]5.7641,[626]5.7634,[627]5.7661,[628]5.7669,[629]5.7676,[630]5.7704,[631]5.7748,[632]5.7803,[633]5.7787,[634]5.7821,[635]5.7818,[636]5.7784,[637]5.7751,[638]5.7773,[639]5.7740,[640]5.7749,[641]5.7755,[642]5.7812,[643]5.7830,[644]5.7850,[645]5.7839,[646]5.7875,[647]5.7830,[648]5.7841,[649]5.7844,[650]5.7874,[651]5.7912,[652]5.7914,[653]5.7952,[654]5.7891,[655]5.7880,

llama_print_timings: load time = 3757.20 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1660958.51 ms / 335360 tokens ( 4.95 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1690237.58 ms

13B, Q3_4 + Q5_1 main: seed = 1682871034 llama.cpp: loading model from ../build/junk2.bin llama_model_load_internal: format = ggjt v1 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 5120 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 40 llama_model_load_internal: n_layer = 40 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 10 (mostly Q3_4) llama_model_load_internal: n_ff = 13824 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 13B llama_model_load_internal: ggml ctx size = 73.73 KB llama_model_load_internal: mem required = 9353.66 MB (+ 1608.00 MB per state) llama_init_from_file: kv self size = 400.00 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
perplexity : calculating perplexity over 655 chunks, batch_size=512
2.78 seconds per pass - ETA 30 minutes
[1]3.8638,[2]4.2438,[3]5.0285,[4]5.4691,[5]5.6500,[6]5.5749,[7]5.7279,[8]5.8483,[9]6.0969,[10]6.3309,[11]6.5275,[12]6.5851,[13]6.5289,[14]6.6307,[15]6.8287,[16]6.5065,[17]6.4160,[18]6.3973,[19]6.1030,[20]6.0757,[21]6.0022,[22]5.8315,[23]5.8015,[24]5.7102,[25]5.7213,[26]5.5727,[27]5.3968,[28]5.2983,[29]5.2181,[30]5.0845,[31]5.0477,[32]5.0637,[33]5.0127,[34]5.0546,[35]5.0779,[36]5.0973,[37]5.0901,[38]5.0859,[39]5.1136,[40]5.1553,[41]5.1784,[42]5.2145,[43]5.1772,[44]5.2185,[45]5.2210,[46]5.1948,[47]5.2217,[48]5.2018,[49]5.2068,[50]5.1752,[51]5.1815,[52]5.1747,[53]5.2203,[54]5.2098,[55]5.1910,[56]5.2134,[57]5.2298,[58]5.2522,[59]5.2691,[60]5.3031,[61]5.2955,[62]5.3497,[63]5.3745,[64]5.3859,[65]5.4235,[66]5.4215,[67]5.4387,[68]5.4507,[69]5.4795,[70]5.5091,[71]5.5318,[72]5.5665,[73]5.6155,[74]5.6227,[75]5.6330,[76]5.6482,[77]5.6602,[78]5.6456,[79]5.6718,[80]5.6661,[81]5.6766,[82]5.6737,[83]5.6292,[84]5.6184,[85]5.6138,[86]5.5979,[87]5.5348,[88]5.4920,[89]5.4704,[90]5.4603,[91]5.4818,[92]5.4778,[93]5.4791,[94]5.4792,[95]5.5064,[96]5.5025,[97]5.4993,[98]5.4960,[99]5.4891,[100]5.4861,[101]5.5101,[102]5.5053,[103]5.5205,[104]5.5248,[105]5.5265,[106]5.5405,[107]5.5396,[108]5.5541,[109]5.5528,[110]5.5473,[111]5.5658,[112]5.5820,[113]5.5806,[114]5.5798,[115]5.5831,[116]5.5710,[117]5.5707,[118]5.5948,[119]5.6120,[120]5.6409,[121]5.6560,[122]5.6767,[123]5.7139,[124]5.7325,[125]5.7269,[126]5.7621,[127]5.7945,[128]5.8224,[129]5.8098,[130]5.8178,[131]5.8138,[132]5.8098,[133]5.7973,[134]5.8061,[135]5.8063,[136]5.7966,[137]5.7923,[138]5.7782,[139]5.7695,[140]5.7679,[141]5.7406,[142]5.7374,[143]5.7121,[144]5.6965,[145]5.6883,[146]5.6761,[147]5.6809,[148]5.6838,[149]5.6807,[150]5.6801,[151]5.6846,[152]5.6780,[153]5.6682,[154]5.6629,[155]5.6696,[156]5.6674,[157]5.6831,[158]5.6844,[159]5.6853,[160]5.6893,[161]5.7012,[162]5.6757,[163]5.6658,[164]5.6443,[165]5.6185,[166]5.5949,[167]5.5631,[168]5.5362,[169]5.5232,[170]5.5140,[171]5.4926,[172]5.4804,[173]5.4673,[174]5.4398,[175]5.4193,[176]5.4058,[177]5.3890,[178]5.3683,[179]5.3554,[180]5.3479,[181]5.3314,[182]5.3149,[183]5.3022,[184]5.3012,[185]5.2933,[186]5.2949,[187]5.3005,[188]5.2977,[189]5.3145,[190]5.3146,[191]5.3317,[192]5.3451,[193]5.3599,[194]5.3711,[195]5.3902,[196]5.4024,[197]5.4218,[198]5.4356,[199]5.4370,[200]5.4385,[201]5.4321,[202]5.4455,[203]5.4518,[204]5.4465,[205]5.4555,[206]5.4606,[207]5.4563,[208]5.4615,[209]5.4655,[210]5.4714,[211]5.4821,[212]5.4887,[213]5.4977,[214]5.5005,[215]5.5040,[216]5.5156,[217]5.5323,[218]5.5463,[219]5.5465,[220]5.5432,[221]5.5386,[222]5.5383,[223]5.5318,[224]5.5251,[225]5.5218,[226]5.5412,[227]5.5471,[228]5.5546,[229]5.5616,[230]5.5581,[231]5.5732,[232]5.5628,[233]5.5476,[234]5.5337,[235]5.5123,[236]5.5070,[237]5.4979,[238]5.5009,[239]5.4895,[240]5.4804,[241]5.4834,[242]5.4855,[243]5.4848,[244]5.4748,[245]5.4715,[246]5.4611,[247]5.4513,[248]5.4448,[249]5.4415,[250]5.4449,[251]5.4366,[252]5.4321,[253]5.4228,[254]5.4190,[255]5.4096,[256]5.3932,[257]5.3831,[258]5.3762,[259]5.3759,[260]5.3675,[261]5.3624,[262]5.3579,[263]5.3528,[264]5.3296,[265]5.3294,[266]5.3264,[267]5.3203,[268]5.3272,[269]5.3268,[270]5.3278,[271]5.3341,[272]5.3373,[273]5.3387,[274]5.3400,[275]5.3461,[276]5.3526,[277]5.3652,[278]5.3739,[279]5.3822,[280]5.3861,[281]5.3953,[282]5.4006,[283]5.4136,[284]5.4226,[285]5.4297,[286]5.4423,[287]5.4388,[288]5.4440,[289]5.4380,[290]5.4237,[291]5.4105,[292]5.3974,[293]5.3855,[294]5.3862,[295]5.3865,[296]5.3913,[297]5.3903,[298]5.3920,[299]5.3896,[300]5.3805,[301]5.3810,[302]5.3746,[303]5.3659,[304]5.3587,[305]5.3564,[306]5.3456,[307]5.3488,[308]5.3494,[309]5.3358,[310]5.3325,[311]5.3280,[312]5.3295,[313]5.3237,[314]5.3223,[315]5.3092,[316]5.3057,[317]5.2930,[318]5.2766,[319]5.2871,[320]5.2986,[321]5.3035,[322]5.3005,[323]5.2960,[324]5.2941,[325]5.3036,[326]5.3052,[327]5.3061,[328]5.3092,[329]5.3144,[330]5.3167,[331]5.3270,[332]5.3231,[333]5.3307,[334]5.3259,[335]5.3205,[336]5.3228,[337]5.3216,[338]5.3212,[339]5.3170,[340]5.3142,[341]5.3207,[342]5.3238,[343]5.3283,[344]5.3288,[345]5.3303,[346]5.3287,[347]5.3323,[348]5.3360,[349]5.3380,[350]5.3362,[351]5.3372,[352]5.3372,[353]5.3320,[354]5.3332,[355]5.3378,[356]5.3408,[357]5.3376,[358]5.3456,[359]5.3477,[360]5.3443,[361]5.3442,[362]5.3513,[363]5.3619,[364]5.3670,[365]5.3709,[366]5.3726,[367]5.3815,[368]5.3790,[369]5.3802,[370]5.3821,[371]5.3781,[372]5.3829,[373]5.3872,[374]5.3851,[375]5.3848,[376]5.3909,[377]5.3874,[378]5.3900,[379]5.3941,[380]5.3870,[381]5.3839,[382]5.3800,[383]5.3783,[384]5.3780,[385]5.3769,[386]5.3759,[387]5.3759,[388]5.3730,[389]5.3694,[390]5.3640,[391]5.3582,[392]5.3547,[393]5.3541,[394]5.3571,[395]5.3565,[396]5.3512,[397]5.3575,[398]5.3618,[399]5.3687,[400]5.3677,[401]5.3684,[402]5.3694,[403]5.3719,[404]5.3775,[405]5.3626,[406]5.3584,[407]5.3576,[408]5.3585,[409]5.3694,[410]5.3786,[411]5.3883,[412]5.4022,[413]5.4125,[414]5.4185,[415]5.4245,[416]5.4314,[417]5.4411,[418]5.4435,[419]5.4482,[420]5.4558,[421]5.4656,[422]5.4689,[423]5.4741,[424]5.4831,[425]5.4906,[426]5.4966,[427]5.5008,[428]5.5078,[429]5.5114,[430]5.5176,[431]5.5305,[432]5.5337,[433]5.5328,[434]5.5292,[435]5.5305,[436]5.5332,[437]5.5416,[438]5.5489,[439]5.5459,[440]5.5453,[441]5.5407,[442]5.5392,[443]5.5403,[444]5.5420,[445]5.5411,[446]5.5430,[447]5.5452,[448]5.5483,[449]5.5468,[450]5.5479,[451]5.5449,[452]5.5295,[453]5.5197,[454]5.5145,[455]5.5148,[456]5.5189,[457]5.5200,[458]5.5180,[459]5.5180,[460]5.5252,[461]5.5215,[462]5.5181,[463]5.5165,[464]5.5162,[465]5.5143,[466]5.5069,[467]5.5058,[468]5.5037,[469]5.5049,[470]5.5041,[471]5.4993,[472]5.5002,[473]5.4956,[474]5.4943,[475]5.4876,[476]5.4859,[477]5.4775,[478]5.4748,[479]5.4757,[480]5.4784,[481]5.4786,[482]5.4740,[483]5.4698,[484]5.4706,[485]5.4645,[486]5.4580,[487]5.4569,[488]5.4541,[489]5.4487,[490]5.4458,[491]5.4424,[492]5.4359,[493]5.4332,[494]5.4314,[495]5.4290,[496]5.4250,[497]5.4188,[498]5.4162,[499]5.4126,[500]5.4045,[501]5.3978,[502]5.3970,[503]5.3960,[504]5.3887,[505]5.3887,[506]5.3894,[507]5.3838,[508]5.3802,[509]5.3806,[510]5.3827,[511]5.3868,[512]5.3907,[513]5.3932,[514]5.3987,[515]5.3948,[516]5.3937,[517]5.3936,[518]5.3936,[519]5.3960,[520]5.3972,[521]5.3983,[522]5.4000,[523]5.4006,[524]5.4058,[525]5.4084,[526]5.4089,[527]5.4106,[528]5.4052,[529]5.4058,[530]5.4021,[531]5.4018,[532]5.4066,[533]5.4091,[534]5.4073,[535]5.4099,[536]5.4055,[537]5.4036,[538]5.4085,[539]5.4093,[540]5.4112,[541]5.4111,[542]5.4125,[543]5.4145,[544]5.4157,[545]5.4145,[546]5.4147,[547]5.4114,[548]5.4073,[549]5.4074,[550]5.4054,[551]5.4028,[552]5.4010,[553]5.3980,[554]5.3957,[555]5.3938,[556]5.3928,[557]5.3943,[558]5.3909,[559]5.3915,[560]5.3903,[561]5.3906,[562]5.3881,[563]5.3879,[564]5.3921,[565]5.3932,[566]5.3936,[567]5.3919,[568]5.3927,[569]5.3912,[570]5.3937,[571]5.3951,[572]5.3961,[573]5.3968,[574]5.3937,[575]5.3920,[576]5.3915,[577]5.3898,[578]5.3881,[579]5.3883,[580]5.3830,[581]5.3801,[582]5.3801,[583]5.3810,[584]5.3814,[585]5.3756,[586]5.3698,[587]5.3703,[588]5.3746,[589]5.3796,[590]5.3827,[591]5.3843,[592]5.3831,[593]5.3792,[594]5.3805,[595]5.3790,[596]5.3829,[597]5.3809,[598]5.3778,[599]5.3806,[600]5.3795,[601]5.3784,[602]5.3789,[603]5.3818,[604]5.3826,[605]5.3855,[606]5.3871,[607]5.3854,[608]5.3826,[609]5.3834,[610]5.3875,[611]5.3863,[612]5.3887,[613]5.3859,[614]5.3819,[615]5.3760,[616]5.3784,[617]5.3734,[618]5.3688,[619]5.3644,[620]5.3533,[621]5.3482,[622]5.3463,[623]5.3477,[624]5.3481,[625]5.3489,[626]5.3484,[627]5.3511,[628]5.3519,[629]5.3525,[630]5.3553,[631]5.3598,[632]5.3644,[633]5.3633,[634]5.3663,[635]5.3660,[636]5.3623,[637]5.3585,[638]5.3606,[639]5.3574,[640]5.3579,[641]5.3584,[642]5.3637,[643]5.3654,[644]5.3676,[645]5.3662,[646]5.3699,[647]5.3651,[648]5.3664,[649]5.3667,[650]5.3695,[651]5.3737,[652]5.3742,[653]5.3779,[654]5.3724,[655]5.3715,

llama_print_timings: load time = 3902.02 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: prompt eval time = 1737619.18 ms / 335360 tokens ( 5.18 ms per token)
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per run)
llama_print_timings: total time = 1764928.09 ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    Less than 4 bitsEfforts related to viable quantized models using <4 bitsenhancementNew feature or requestgeneration qualityQuality of model output

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions