Skip to content

Include usage key in create_completion when streaming #1498

Open
@zhudotexe

Description

@zhudotexe

Is your feature request related to a problem? Please describe.
Since create_completion may yield text chunks comprised of multiple tokens per yield (e.g. in the case of multi-byte Unicode characters), counting the number of yields may not equal the number of tokens actually generated by a model. To accurately get the usage statistics of a streamed completion, one has to run the final text through the tokenizer again, despite create_completion already tracking the number of tokens generated by the model.

Describe the solution you'd like
When stream=True in create_completion, the final chunk yielded should include the usage statistics in the 'usage' key.

Describe alternatives you've considered

  • Saving full generated text and running it through the tokenizer again (seems wasteful)
  • Counting the number of yields and hoping we don't have any multi-byte characters (hacky and fragile)

Additional context
The OpenAI API has recently added similar support in their streaming API with the stream_options key: https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions