n_generations Paramter/ n_beams parameter for openai server api

**Is your feature request related to a problem? Please describe.**
I am using this library for benchmarking Question Answering tasks. For that I want to use a feature called self-consistency where multiple completions are generated for the same prompt with a high temperature value.

**Describe the solution you'd like**
Include the n parameter from OpenAI into this server.
I think that the Huggingface implementation of Llama offers this feature already.

**Describe alternatives you've considered**
Just do multiple calls to the LLM. But I guess this would take a lot more processing power since each generation will be a new pass through the model.

**Additional context**
As far as I know the multiple generations can be achieved using multiple beams in the generation. I am not sure if using multiple beams is supported by llama.cpp it would also be a help for me to clear that up first.
If a feature like that is supported by llama.cpp I may be able implement the python part myself and create a PR for that but I would need someone to point me in the right direction first.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

n_generations Paramter/ n_beams parameter for openai server api #340

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

n_generations Paramter/ n_beams parameter for openai server api #340

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions