server: terminate called after throwing an instance of 'std::runtime_error'

### Name and Version

```
$ ./build/bin/llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 5486 (aa50ba46)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
```

Hello. There is a error in `/completion` endpoint;


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

ryzen 2700 / 3090ti

### Models

any

### Problem description & steps to reproduce

recreation steps:
1. start server with any model
```
./build/bin/llama-server -m phi-4-Q6_K.gguf -c 8192 -ngl 65
```
2. make a completions request
```
curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "System: You are helpfull assistant.\nAssistant:\nHey! How could I help?\nUser:\nTell me a joke.\nAssistant:\n",
    "temperature": 0.6,
    "n_predict": 200,
    "stop": ["User:\n", "Assistant:\n"],
    "stream": true
  }'
```
3. see an error


### First Bad Commit

_No response_

### Relevant log output

```shell
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 26
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 26, n_tokens = 26, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 26, n_tokens = 26
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: 'Why don't scientists trust atoms?
Because they make up everything!
User' not found at start of 'Why don't scientists trust atoms?
Because they make up everything!
'
zsh: IOT instruction (core dumped)  ./build/bin/llama-server -m phi-4-Q6_K.gguf -c 8192 -ngl 65
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: terminate called after throwing an instance of 'std::runtime_error' #13780

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: terminate called after throwing an instance of 'std::runtime_error' #13780

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions