Skip to content

Minimal implementation of partial clones & promisor objects #1046

Open
@cesfahani

Description

@cesfahani

Summary 💡

Tracking a minimal implementation of partial clones and their corresponding promisor objects/remotes, as discussed in #1041.

What is a partial clone?

Partial clone is a recent(ish) feature of git that allows the client to fetch a subset of objects from a remote repository based on criteria (i.e. a "filter") of its choosing. The missing objects are referred to as "promisor objects", and are expected to be able to be provided by "promisor remotes" on-demand after the clone, as needed.

The most common use-case of partial-clone are where the client requests a clone with either no historical blobs (e.g. --filter=blob:none), or only historical blobs under some size threshold (e.g. --filter=blob:512k). Tree objects can also be filtered by a partial clone, however that use-case is far less common.

Lessons learned from git

Because partial clone was retrofitted into git, there are several performance gaps that have not yet been resolved. Operations like fetch and checkout behave exactly as one would expect - the missing objects are fetched in a single transaction with the remote. Other operations, such as blame and rebase, do not do this, and instead end up lazily fetching missing objects one at a time (each with a separate transaction to the remote), which significantly slows things down.

To implement partial clones efficiently, operations that traverse history and require inspecting the contents of blobs and trees need to:

  1. Determine the set of object IDs needed by the operation (typically by walking a commit graph)
  2. Fetch any missing objects in a single transaction to the remote
  3. Continue with their "business logic"

That said, this feature does not aim to implement the optimized approach to partial clones across the board. However we would like to see APIs designed to facilitate the optimized approach, and possibly one implementation of the optimized approach to be used as a reference and proof that things can be made to work as expected.

Tasks

  • basic implementation of a connectivity check (via gix fsck connectivity)
    • create a text fixture(s) with missing blobs in promisor packs
    • create unit test(s) to verify connectivity check properly reports missing blobs
      • useful to confirm that promisor objects are correctly being identified
      • will also help (later on) confirm that the correct objects are fetched when a filter is provided
  • minimal implementation of fresh partial bare clone
    • only support for blob filters needed here
    • no checkout of working tree supported yet (bare clones only)
    • set partial clone filter in newly created local config as remote.<name>.partialclonefilter
    • set promisor field in newly created local config as remote.<name>.promisor
      • set to true if a partial clone filter was provided
      • leave unset otherwise
    • plumb through support for partial bare clones to gix CLI
    • create unit tests(s) to verify partial bare clones produce an identical packfile as produced by git
    • create unit tests(s) to confirm partialclonefilter and promsior are set appropriately
  • make packfiles fetched from a promisor remote be "promisor packfiles"
  • checkout from a partial clone
    • must respect the promisor and partialclonefilter config settings on the remotes
    • must respect whether a packfile is a promisor pack
    • should result in being able to do a fresh non-bare partial clone
  • fetch from a partial clone repository

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-tracking-issueAn issue to track to track the progress of multiple PRs or issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions