Skip to content

Commit ac79fbc

Browse files
mgroeber9110tinglou
authored andcommitted
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (ggml-org#11496)
1 parent 09634eb commit ac79fbc

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/llama-vocab.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1692,7 +1692,7 @@ void llama_vocab::impl::load(llama_model_loader & ml, const LLM_KV & kv) {
16921692
GGML_ASSERT(!ids.empty() && "model vocab missing newline token");
16931693
linefeed_id = ids[0];
16941694
} else {
1695-
const std::vector<int> ids = tokenize("\xC4\x8A", false); // U+010A
1695+
const std::vector<int> ids = tokenize("\n", false);
16961696

16971697
//GGML_ASSERT(!ids.empty() && "model vocab missing newline token");
16981698
if (ids.empty()) {

0 commit comments

Comments
 (0)