feat(frontend): propagate upstream and CG headers to clients #63461

valerybugakov · 2024-06-25T02:24:36Z

STATUS: draft for an early review of the approach.
Previously, the Sourcegraph backend did not propagate the CG and inference provider headers to clients, causing clients using the fast-path to receive different response headers. This PR resolves this discrepancy by ensuring that all headers from CG and inference providers are propagated to clients through the Sourcegraph backend.
This PR would allow us to analyze E2E autocomplete latency by model using client analytics events.
Closes https://linear.app/sourcegraph/issue/CODY-2523/[autocomplete-latency]-propagate-the-x-cody-resolved-model-header-from

Test plan

Updated unit tests.
TODO: update the rest of the unit tests and add a new one for header propagation.

Changelog

The Sourcegraph backend now propagates CG and inference provider headers to clients.

rafax · 2024-06-25T14:01:59Z

internal/completions/types/types.go

+
+// ResponseMetadataCapture holds metadata from an HTTP response, including headers,
+// status code, and a function to send completion events.
+type ResponseMetadataCapture struct {


nit: I'm not a native English speaker, but it feels like Capture is not the right word to use here - maybe consider using Captor? Or, Forwarder (since what it does is forward response events and metadata)?

Agree. I like Forwarder. Will rename it if we decide to keep this struct around.

rafax · 2024-06-25T14:03:16Z

internal/completions/types/types.go

+	}
+
+	if rc.statusCode != nil {
+		w.WriteHeader(*rc.statusCode)


Do we really want to rewrite upstream codes 1:1? Is this something we did before, or is this a new change introduced in this PR?

I agree that this looks a little suspect. At the very least, we should map all 5xx errors we receive from the upstream LLM API provider to a 502 error. So it is clear that the server-side problem didn't originate from the Sourcegraph instance. But passing through 401 and 403 errors would cause similar confusion as well.

So if we do need to send the upstream status code to the client, I would urge us to consider something more robust. Like maybe even adding it as part of a header in the HTTP response or JSON response payload. Just so that it is clear what the "Sourcegraph instance" status code was, as distinct from the "upstream LLM provider" status code.

This change is new and optional for the primary purpose of this PR. My rationale for adding this was:

We already handle the upstream inference provider status code on the Cody Gateway side here.

We already use the Cody Gateway API with these status codes directly for fast path.

If the above is true, then forwarding the status code from CG to frontend to clients would mean a more consistent API contract (exact status code handling for fast path and nonfast path requests) with no new issues (because we already expose these status codes for fast path requests).

Am I missing something here? Based on your suggestions, I'm happy to remove or adjust it, but I want to calibrate my understanding first.

Rafal's comment applies here too. Will update ✅

TL;DR: Sounds like we are all on the same page. This is good logic to add, we just need to be a little careful is all.

To reiterate some of the things mentioned in those other threads and this one:

We cannot copy HTTP headers indiscriminately, because it might (1) leak information we shouldn't, e.g. CORS, or (2) communicate unintended behavior, e.g. Connection: Close

We simply cannot copy HTTP status codes indiscriminately for similar reasons. e.g. a 401 Unauthorized response could either mean the Client->Sourcegraph credentials were missing or malformed, or the Sourcegraph->CodyGateway credentials were missing or malformed.

This doesn't mean that we cannot or shouldn't do this "forwarding" of headers, just that we need to be clear about what we forward and why.

So IMO, it might make sense to have an allow list of HTTP headers that we forward on in the response. (Rather than a deny list to pick out the ones we know we shouldn't, since that could send data we don't know ahead of time is problematic.)

Also, having a set of HTTP status codes we remap. e.g. how Cody Gateway maps 429->503. In the Sourcegraph backend, mapping 500-502 might make sense as well.

chrsmith

I'm totally onboard with us fixing this problem. And I think this is a good approach. But I don't think this is quite right in terms of how we solve things.

The ResponseMetadataCapture design here doesn't feel quite right. The HTTP status code and headers are sent available initially, and then as new SSE events come in those are written out. So it seems awkward to have the ResponseMetadataCapture carry those around for the "lifetime of the streaming operation".

Did you consider updating the interface for types.CompletionClient directly? e.g. something like:

type CompletionsClient interface {
	// Stream executions a completions request, streaming results to the callback.
	// Callers should check for ErrStatusNotOK and handle the error appropriately.
+	//
+	// The returned response object is the _LIVE_ HTTP response from the upstream
+	// provider. So do not read from or close the response body, or else you're
+	// going to have a bad time. 
-	Stream(context.Context, log.Logger, CompletionRequest, SendCompletionEvent) error
+ 	Stream(context.Context, log.Logger, CompletionRequest, SendCompletionEvent) (*http.Response, error)
	// Complete executions a completions request until done. Callers should check
	// for ErrStatusNotOK and handle the error appropriately.
	Complete(context.Context, log.Logger, CompletionRequest) (*CompletionResponse, error)
}

The thing I like about that approach -- or something like the function just returning (*types.StatusAndHeaders, error) -- is that the status and header codes are only available when the call to Stream(...) is first made. And so the data is available only when the caller needs to start writing its own http.Response (and forwarding the upstream headers). But not unnecessarily added as part of the ongoing sendEventFn.

What do you think?

eseliger · 2024-06-25T21:19:12Z

internal/completions/types/types.go

+func (rc *ResponseMetadataCapture) ApplyCapturedMetadata(w http.ResponseWriter) {
+	for key, values := range rc.headers {
+		for _, value := range values {
+			w.Header().Add(key, value)


I am a bit scared that the upstream might return CORS headers or other security-related headers. Can we allow-list certain headers that are interesting to us? Also, we need control over certain headers like content-type to trigger SSE, that might be different on some upstream providers.

We should certainly skip those https://sourcegraph.com/-/editor?remote_url=github.com%2Fsourcegraph%2Fsourcegraph&branch=main&file=cmd%2Fcody-gateway%2Finternal%2Fhttpapi%2Fcompletions%2Fupstream.go&start_row=46&start_col=5&end_row=46&end_col=5&editor=JetBrains&version=v6.0.3

The same reasoning applies here. Because we go through Cody Gateway, these filters are already applied. I verified this locally by adding Fireworks-Server-Time-To-First-Token to hopHeaders. Then I observed that it disappeared from the headers forwarded by frontend to the VS Code client.

I consider Cody Gateway to be the owner of filtering the inference provider headers. That's why it should be safe to forward all of them here. If there's an issue with that, should we handle it on the CG side?

The issue is that hopHeaders can be added by any layer between CG and SG (Enterprise networking proxies, public internet, Cloudflare, GCP LB, Cloud Run routing (however it works)), and forwarding ex. Connection: Close will kill all requests sharing a Cody Client -> SG connection.

Got it, makes sense. And the same applies to status codes. Will add filtering on the SG level.

valerybugakov · 2024-06-27T04:15:32Z

internal/completions/types/types.go

-	Stream(context.Context, log.Logger, CompletionRequest, SendCompletionEvent) error
+	Stream(context.Context, log.Logger, CompletionRequest, *ResponseMetadataCapture) error


Thanks for the suggestion, @chrsmith!

the function just returning (*types.StatusAndHeaders, error)

I like this one! Will update 👍

Stream(context.Context, log.Logger, CompletionRequest, SendCompletionEvent) (*http.Response, error)

I didn't want to return (*http.Response, error) because it could allow manipulation of the underlying response struct. I assumed that we intend to have the consumer receive data exclusively through the SendCompletionEvent callback, and it's nice to force it through the function interface.

the function just returning (*types.StatusAndHeaders, error)

I started implementing it locally and found another constraint that forced me to do that by providing an object to capture response metadata. We need to receive *types.StatusAndHeaders before or with the first completion event to inform the client about the response status and headers. With the current implementation, we return after the stream is completed or when we encounter an error.

To achieve the same behavior, we need to reimplement the control flow of the Stream function so that every completion provider can return early. Or I can use my initial approach, which is more compatible with the current implementation where we use a sendEvent callback to yield to the consumer as soon as data is available.

That's my understanding as a non-frequent Go user 🙂. Let me know if you see a better way!

Thanks for looking into this @valerybugakov . Makes sense.

I didn't want to return (*http.Response, error) because it could allow manipulation of the underlying response struct.

Yup, that's a good reason not to do that.

We need to receive *types.StatusAndHeaders before or with the first completion event to inform the client about the response status and headers.

Ah yeah, I hadn't considered that case. So yes, we should have the headers and status code go into whatever that "new data type/parameter" is.

But the design still seems a little clunky IMHO, because it's trying to do two different things.

Cache theupstream HTTP response information.

Wrap the SendEvent callback so the caller will get events as they are sent from upstream.

However, thinking about some alternatives, nothing stands out as a good enough improvement. And what you have here is perfectly reasonable. So 🚀 . I'm happy to sign off on this once make a decision regarding the "which headers to forward" and/or exactly how we map status codes.

Frontend: propagate upstream and CG headers to clients

e61a2a3

cla-bot bot added the cla-signed label Jun 25, 2024

valerybugakov changed the title ~~Frontend: propagate upstream and CG headers to clients~~ faet(frontend): propagate upstream and CG headers to clients Jun 25, 2024

valerybugakov changed the title ~~faet(frontend): propagate upstream and CG headers to clients~~ feat(frontend): propagate upstream and CG headers to clients Jun 25, 2024

valerybugakov requested review from rafax, chrsmith and eseliger June 25, 2024 09:51

valerybugakov self-assigned this Jun 25, 2024

valerybugakov requested a review from a team June 25, 2024 11:19

rafax reviewed Jun 25, 2024

View reviewed changes

chrsmith suggested changes Jun 25, 2024

View reviewed changes

eseliger reviewed Jun 25, 2024

View reviewed changes

valerybugakov commented Jun 27, 2024

View reviewed changes

		Stream(context.Context, log.Logger, CompletionRequest, SendCompletionEvent) error
		Stream(context.Context, log.Logger, CompletionRequest, *ResponseMetadataCapture) error

feat(frontend): propagate upstream and CG headers to clients #63461

Are you sure you want to change the base?

feat(frontend): propagate upstream and CG headers to clients #63461

Uh oh!

Conversation

valerybugakov commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Changelog

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valerybugakov Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrsmith left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valerybugakov Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

valerybugakov commented Jun 25, 2024 •

edited

Loading

valerybugakov Jun 27, 2024 •

edited

Loading

chrsmith left a comment •

edited

Loading

valerybugakov Jun 27, 2024 •

edited

Loading