Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Gitserver: recursive git tree lister #62019

Open
@camdencheek

Description

@camdencheek

The API to support the file tree would benefit from a bespoke RPC in gitserver.

A few requirements from the perspective of the final consumer (some of this could be implemented in frontend for a simpler gitserver API):

  • Pagination. Some directories contain many entries, and we need to be able to request more in a followup. What we use for cursor (offset? tree OID?) is up for debate.
  • Recursive listing (e.g. "get me all the things I would need to show in the UI if I expanded the file tree out to this file"). This can be handled in frontend, which might make sense to simplify the gitserver API. However, that comes at the cost of higher latency for deeply nested directories because we will need to make many followup requests. I'm thinking maybe we solve this with bidi streaming?
  • (Nice to have) Single-entry directory collapse (e.g. fetch the info to render src/main/com/sourcegraph/gitserver if that's the only thing each of those dirs contains). If we do this in frontend, we need to make a followup request for every directory to see if it contains only a single child, then do that recursively, which is expensive and slow.
  • (Nice to have) Language detection. High-fidelity language detection requires that the file contents can be fetched if the language is ambiguous from just the file name. Right now, this means fetching the file contents from gitserver as a followup request, which is slow. It would be considerably cheaper to do this on gitserver itself.

Proposed RPC proto (starting point, feel free to tear apart):

service GitserverService {
  rpc ListFiles(stream ListFilesRequest) returns (stream ListFilesResponse) {
    option idempotency_level = NO_SIDE_EFFECTS;
  }
}

message ListFilesRequest {
  string repo_name = 2;
  string commit_sha = 3;
  string path = 4;

  uint32 limit = 5;
  bool detect_language = 6;
  bool recurse_single_child = 7;
}

message ListFilesResponse {
  string repo_name = 2;
  string commit_sha = 3;
  string path = 4;

  repeated ListFileEntry entries = 5;
  // Whether or not entries is complete
  bool limit_hit = 6;
}

message ListFileEntry {
  // The name of the entry, which can be appended to its parent's path to get
  // the full path to this entry
  string name = 1;
  // Should be blob or tree. Can probably be an enum.
  string type = 2;
  // The oid of the entry.
  string oid = 3;
  // The detected languages of the etry. Will always be empty if a tree entry.
  repeated string languages = 4;
}

This is implemented as a stream so the caller can make asynchronous requests for followup information. This would be used for recursive listing (send a request for each ancestor directory) and for single-child collapsing (sending an asynchronous folllowup for each tree entry). The response would include information on whether or not a limit was hit for that directory. This does not currently have a cursor of any sort, but we could change limit_hit to a next_page_oid cursor or something.

This includes a recurse_single_child option that would proactively send back ListFilesResponse for each tree entry that only contains a single child. This is not strictly necessary because we could send a followup request for each tree entry with limit = 1, but it would likely significantly improve latency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestteam/sourceTickets under the purview of Source - the one Source to graph it all

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions