Skip to content

git pack generation #3

Closed
Closed
@joshtriplett

Description

@joshtriplett

As promised on Reddit, here's an outline of the parallel pack generation use case:

  • I have a set of objects in the repository, identified via a list of object roots.
  • I have an optional set of potential thin-pack bases, which the other end has; the objects that go in the pack are all objects reachable from the roots and not reachable from the bases, and the objects usable for deltas are all objects reachable from either the roots or the bases.
  • I'd like to generate a pack (or thin-pack if any bases are specified), and stream that pack either to disk or over a network connection.
  • Sometimes that connection or disk will be slow, other times it'll be absurdly fast. I'd like to have some reasonable control over the tradeoffs between pack generation speed and pack size.
  • It would be nice to handle generating a pack that includes some objects that aren't in a repository, without first having to add the objects to the repository. (For instance, blobs and trees generated from a directory or extracted from a tarball.) Not a hard requirement, but helpful.
  • Massive bonus points if git-oxide could start streaming the pack almost immediately, and adaptively do as well at compression as you can in the time until the next bits are needed to send over the wire, to come as close as possible to saturating the available network speed. (The pack can later be repacked for space, taking more time to do so more effectively.)

I'd love to test this out, and I'd be happy to do so on the biggest machines I can throw at it, though I'd also like it to work well in the 2-8 CPU case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions