Description
Summary 💡
Tracking a minimal implementation of partial clones and their corresponding promisor objects/remotes, as discussed in #1041.
What is a partial clone?
Partial clone is a recent(ish) feature of git
that allows the client to fetch a subset of objects from a remote repository based on criteria (i.e. a "filter") of its choosing. The missing objects are referred to as "promisor objects", and are expected to be able to be provided by "promisor remotes" on-demand after the clone, as needed.
The most common use-case of partial-clone are where the client requests a clone with either no historical blobs (e.g. --filter=blob:none
), or only historical blobs under some size threshold (e.g. --filter=blob:512k
). Tree objects can also be filtered by a partial clone, however that use-case is far less common.
Lessons learned from git
Because partial clone was retrofitted into git
, there are several performance gaps that have not yet been resolved. Operations like fetch
and checkout
behave exactly as one would expect - the missing objects are fetched in a single transaction with the remote. Other operations, such as blame
and rebase
, do not do this, and instead end up lazily fetching missing objects one at a time (each with a separate transaction to the remote), which significantly slows things down.
To implement partial clones efficiently, operations that traverse history and require inspecting the contents of blobs and trees need to:
- Determine the set of object IDs needed by the operation (typically by walking a commit graph)
- Fetch any missing objects in a single transaction to the remote
- Continue with their "business logic"
That said, this feature does not aim to implement the optimized approach to partial clones across the board. However we would like to see APIs designed to facilitate the optimized approach, and possibly one implementation of the optimized approach to be used as a reference and proof that things can be made to work as expected.
Tasks
- basic implementation of a connectivity check (via
gix fsck connectivity
)- create a text fixture(s) with missing blobs in promisor packs
- create unit test(s) to verify connectivity check properly reports missing blobs
- useful to confirm that promisor objects are correctly being identified
- will also help (later on) confirm that the correct objects are fetched when a filter is provided
- minimal implementation of fresh partial bare clone
- only support for blob filters needed here
- no checkout of working tree supported yet (bare clones only)
- set partial clone filter in newly created local config as
remote.<name>.partialclonefilter
- set promisor field in newly created local config as
remote.<name>.promisor
- set to
true
if a partial clone filter was provided - leave unset otherwise
- set to
- plumb through support for partial bare clones to
gix
CLI - create unit tests(s) to verify partial bare clones produce an identical packfile as produced by
git
- create unit tests(s) to confirm
partialclonefilter
andpromsior
are set appropriately
- make packfiles fetched from a promisor remote be "promisor packfiles"
- checkout from a partial clone
- must respect the
promisor
andpartialclonefilter
config settings on the remotes - must respect whether a packfile is a
promisor
pack - should result in being able to do a fresh non-bare partial clone
- must respect the
- fetch from a partial clone repository