Gitserver: recursive git tree lister #62019
Description
The API to support the file tree would benefit from a bespoke RPC in gitserver.
A few requirements from the perspective of the final consumer (some of this could be implemented in frontend
for a simpler gitserver API):
- Pagination. Some directories contain many entries, and we need to be able to request more in a followup. What we use for cursor (offset? tree OID?) is up for debate.
- Recursive listing (e.g. "get me all the things I would need to show in the UI if I expanded the file tree out to this file"). This can be handled in
frontend
, which might make sense to simplify the gitserver API. However, that comes at the cost of higher latency for deeply nested directories because we will need to make many followup requests. I'm thinking maybe we solve this with bidi streaming? - (Nice to have) Single-entry directory collapse (e.g. fetch the info to render
src/main/com/sourcegraph/gitserver
if that's the only thing each of those dirs contains). If we do this infrontend
, we need to make a followup request for every directory to see if it contains only a single child, then do that recursively, which is expensive and slow. - (Nice to have) Language detection. High-fidelity language detection requires that the file contents can be fetched if the language is ambiguous from just the file name. Right now, this means fetching the file contents from gitserver as a followup request, which is slow. It would be considerably cheaper to do this on gitserver itself.
Proposed RPC proto (starting point, feel free to tear apart):
service GitserverService {
rpc ListFiles(stream ListFilesRequest) returns (stream ListFilesResponse) {
option idempotency_level = NO_SIDE_EFFECTS;
}
}
message ListFilesRequest {
string repo_name = 2;
string commit_sha = 3;
string path = 4;
uint32 limit = 5;
bool detect_language = 6;
bool recurse_single_child = 7;
}
message ListFilesResponse {
string repo_name = 2;
string commit_sha = 3;
string path = 4;
repeated ListFileEntry entries = 5;
// Whether or not entries is complete
bool limit_hit = 6;
}
message ListFileEntry {
// The name of the entry, which can be appended to its parent's path to get
// the full path to this entry
string name = 1;
// Should be blob or tree. Can probably be an enum.
string type = 2;
// The oid of the entry.
string oid = 3;
// The detected languages of the etry. Will always be empty if a tree entry.
repeated string languages = 4;
}
This is implemented as a stream so the caller can make asynchronous requests for followup information. This would be used for recursive listing (send a request for each ancestor directory) and for single-child collapsing (sending an asynchronous folllowup for each tree entry). The response would include information on whether or not a limit was hit for that directory. This does not currently have a cursor of any sort, but we could change limit_hit
to a next_page_oid
cursor or something.
This includes a recurse_single_child
option that would proactively send back ListFilesResponse
for each tree entry that only contains a single child. This is not strictly necessary because we could send a followup request for each tree entry with limit = 1, but it would likely significantly improve latency.