Skip to content

server: terminate called after throwing an instance of 'std::runtime_error' #13780

Closed
@GrailFinder

Description

@GrailFinder

Name and Version

$ ./build/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 5486 (aa50ba46)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Hello. There is a error in /completion endpoint;

Operating systems

Linux

GGML backends

CUDA

Hardware

ryzen 2700 / 3090ti

Models

any

Problem description & steps to reproduce

recreation steps:

  1. start server with any model
./build/bin/llama-server -m phi-4-Q6_K.gguf -c 8192 -ngl 65
  1. make a completions request
curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "System: You are helpfull assistant.\nAssistant:\nHey! How could I help?\nUser:\nTell me a joke.\nAssistant:\n",
    "temperature": 0.6,
    "n_predict": 200,
    "stop": ["User:\n", "Assistant:\n"],
    "stream": true
  }'
  1. see an error

First Bad Commit

No response

Relevant log output

slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 26
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 26, n_tokens = 26, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 26, n_tokens = 26
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: 'Why don't scientists trust atoms?
Because they make up everything!
User' not found at start of 'Why don't scientists trust atoms?
Because they make up everything!
'
zsh: IOT instruction (core dumped)  ./build/bin/llama-server -m phi-4-Q6_K.gguf -c 8192 -ngl 65

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions