Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Discussion: Object storage for others things than lsif-bundle-manager #15149

Open
@tsenart

Description

@tsenart

This RFC by @efritz put in motion the introduction of object storage (i.e. S3 / GCS / Minio) in our infrastructure. What other stateful services could leverage this now that we have it and what would be the benefits? git-server and indexed-search come to mind.

Original Slack thread.

Campaigns

@eseliger: Not a stateful set, but campaigns could use it, to reduce disk usage in postgres for these:
Potentially now: patch files for changesets uploaded from src-cli. If it was exposed somehow, maybe using an S3 client would also give us low-effort chunked uploads. (I think codeintel does that as well, but they implemented it themselves)
In the future: logs from server side executed campaigns

Search

@keegancsmith: To explain why I like this for zoekt. We currently have every zoekt-indexserver paired with a zoekt-webserver. We use the persistent volume on that as the persistent store. If our set of zoekt servers changes, we don't move the shards around, instead the responsible zoekt server will recompute the index.
By introducing a blob store we will have a place to store the computed indexes and can treat the pool of indexservers independently from the pool of servers for responding to search requests. When the set of zoekt-webservers change (scaling, roll out, a pod dies/etc) shuffling shards is now cheaper, since we just need to fetch from the blob store. Note: The zoekt-webservers will still require a local disk, but it won't need to be persistent. IE zoekt-webservers can go from being a statefulset to a deployment. They aren't truly cattle, since it will still have a cost to fetch from the blob store (just orders of magnitude less than computing). So we may keep them as statefulsets with network attached storage.

Distribution

@slimsag: worth thinking about how this would scale, e.g. how to prevent LSIF DB thrashing from harming campaigns perf and vice-versa. initial intuition from me would be to keep usages of it seperate so different usages of it can be scaled seperately. yeah I am mostly thinking of the minio case, e.g. one system abuses minio DB and makes scaling it hard because multiple different things (LSIF, campaigns, and more) are packed into it (e.g. could even be due to network throughput.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions