Open
Description
Caching prebuilds locally in all regions could make Gitpod both faster and cheaper when starting new workspaces from prebuilds.
Current situation
- A Prebuild is built and stored in one region (typically the US)
- It is then never cached in other regions (e.g. the EU)
- When starting a workspace on the same commit as a Prebuild, the Prebuild is always downloaded from the original "source" region
Problem
Transferring data between GCP regions is slow and expensive.
A good example to illustrate this problem is:
- Gitpod's sample template repositories typically have one latest Prebuild (stored in one region)
- But they are frequently opened in all regions (causing multiple cross-regional transfers of the same Prebuild, over and over)
Feature request
When a Prebuild is built and stored in region A, and then requested once in region B, it would be great if a "reference copy" (or "cached copy") of that Prebuild also gets stored in region B for future use.
This way, all subsequent requests in region B can use the locally-cached copy (instead of causing constant cross-regional transfers of the same data).
Proposed solution
- A workspace is started on a specific repository & commit
- Gitpod checks whether a prebuild is available for this repository & commit
- (new) If there is a prebuild, but it is from a different region, and it is not cached locally, we still use that prebuild to start the workspace, but we also create a local cache in parallel
- (new) The next time a workspace is started on this specific repository & commit & region, the locally-cached prebuild is used instead (faster & cheaper)
Implication on garbage collection:
- We'll need to check a prebuild's usage in all regions (i.e. also count the uses of cached copies)
- We'll need to garbage-collect unused prebuilds in all regions (i.e. also delete all the cached copies)
See also
- More on cross-regional transfers (internal)
- Proposal thread on Slack (internal)
- Trade-offs between GCP region cache and Cloudflare R2 (internal)