Claude 3 on Bedrock with Spring AI - Ultra slow TTFT

**Bug description**
Basically, the title.

We have context windows of around ~80-90K tokens and we've observed a TTFT in the order of 20-30s, which would sometimes cause a timeout from the Spring AI side.

As anyone experienced this before?

We are working with streaming, and the actual "output generation" in terms of tokens per second is very okay and stable, it just seems that there might be something related with the TTFT?

**Environment**
Spring AI 1.0.0-M1
Java 21
Springboot 3.3.0

**Steps to reproduce**
Send a prompt to Claude 3 Sonnet with ~85K context window size using streaming mode. Observe the time it takes to generate first token (20-30s, often with timeouts).

**Expected behavior**
Ideally the timeout period would be bigger from the framework side and/or the generation would be faster. I wonder if this is expected even when using streaming.

**Minimal Complete Reproducible example**
A prompt with a long list of names, for example, to hit 85K input tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude 3 on Bedrock with Spring AI - Ultra slow TTFT #1235

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Claude 3 on Bedrock with Spring AI - Ultra slow TTFT #1235

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions