Description
Bug description
Basically, the title.
We have context windows of around ~80-90K tokens and we've observed a TTFT in the order of 20-30s, which would sometimes cause a timeout from the Spring AI side.
As anyone experienced this before?
We are working with streaming, and the actual "output generation" in terms of tokens per second is very okay and stable, it just seems that there might be something related with the TTFT?
Environment
Spring AI 1.0.0-M1
Java 21
Springboot 3.3.0
Steps to reproduce
Send a prompt to Claude 3 Sonnet with ~85K context window size using streaming mode. Observe the time it takes to generate first token (20-30s, often with timeouts).
Expected behavior
Ideally the timeout period would be bigger from the framework side and/or the generation would be faster. I wonder if this is expected even when using streaming.
Minimal Complete Reproducible example
A prompt with a long list of names, for example, to hit 85K input tokens.