-
-
Notifications
You must be signed in to change notification settings - Fork 353
Add topological walk #1336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add topological walk #1336
Conversation
This way, it's much clearer what possible values are. For performance, it also adds `Entry::stage_raw()` to avoid having to match on a number.
…ened. This will help with issues like these: o2sh/onefetch#1277
Even inactive submodules are shown in the status by `git status`, so `gix` should do the same. First observed in helix-editor/helix#5645 (comment)
… Vec. This makes the API more consistent, and one can pass `None` as well.
It's interesting to note that one of these warnings constitute reading the fields, but these come about by displaying a debug version of these fields, which now doesn't qualify anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a million for contributing true topological traversal!
I only had a quick look this time, less than 15 minutes, and really like what I see! There is a few minor comments, only, and the lint CI failure is something that would certainly also be going away with my refactor. However, please feel free to fix it earlier as I won't get to work on this immediately.
In preparation of a new module for topological commit walk. The idea is that the current walk will be the "default" hence the 'pub use'.
Makes it easier to use these types for different kinds of commit walking. No matter how the history graph is traversed some concepts remain identical.
This commit introduces a new module `topo` in gix-traverse that contains an iterator for walking commits topologically. It is very much based on the existing Git implementation introduced in b454241[1] as well as improvements done since then. Function names are kept as much as possible. The mailing list[2] has a lot of interesting discussion about how the algorithm works. In short it is a three stage process that aims to reduce the amount of traversal needed before the first result can be output. Hopefully this makes it easier to use gitoxide to implement other tools that need topological sorting, e.g. graphical log and blame. For tests there is a separate type named TraversalAssertion. There is probably grounds for making code more common in this area. [1]: git/git@b454241 [2]: https://public-inbox.org/git/[email protected]/ Add tests for gix_traverse::topo This unfortunately duplicates types found in the existing tests.
Review Tasks
|
pub enum Sorting { | ||
/// Show no parents before all of its children are shown, but otherwise show | ||
/// commits in the commit timestamp order. | ||
#[default] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick question: Since this is all about topo
, should the default not be TopoOrder
?
I also have trouble understanding how this DateOrder
maps to Git. Is there a git rev-list
equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have Git's code (with my debug modifications) in front of me. I can fill in details later, but from memory:
In Git there is an older sorting algorithm that that's used if gen numbers aren't available. If neither --date-order
nor --topo-order
is specified then that algorithm is used, for some reason. So in Git the default is "neither of them". (this is the part I must double-check). I think the idea is that with time the old algorithm can be removed from Git entirely, but it hasn't happened yet.
DateOrder
maps to git rev-list --date-order
. I have no strong feelings about which is the default for us; I imagine most callers would want to set the order explicitly, anyway. Git's default is equivalent to the new algorithm with --date-order
(again, I need to double check this) which is why I picked DateOrder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful, thanks so much for the explanation! The level of detail is perfectly fine, and I just added b8486f7 to add another test and incorporate the information.
It's amazing that Git still has this older algorithm, but then added more just by means of flags which don't seem to be mutually exclusive even - git rev-list --topo-order --date-order @
does something.
We remove ability to reuse `State` as ultimately,this capability wasn't all that useful.
…nt with other plumbing crates. This makes sure that each time is reached by only a single path.
…tors` to `Simple`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this massive PR, I think it's great!
After some failed attempts to unify the package structure, I ended up with something that feels quite 'plumbing', but which at least is consistent with the current structure which sees verything as traversal (and not as Walk). I also considered to put both traversals/walks into gix-revwalk
, and into gix-revision
as well, but felt like gix-traverse
is most 'meant' for these implementations.
I have also looks at the gix::Repository::rev_walk()
implementation and think that it should be possible to use either one or the other implementation, based on a new Sorting
enum that merges both topo::Sorting
and simple::Sorting
(previously ancestors::Sorting
.
But doing so is left for another PR, and if you want, you can give the implementation a try.
Did you call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, indeed I may be been influenced by this, and the Ancestors
iterator looked a lot simpler after seeing the state handling in the new topological one 😁.
Of course, both have their place and I'd use Simple
when memory consumption must be kept in check and there is no particular need for more advanced ordering.
In a way, Ancestors
was planned to be the 'one' iterator which handles a bunch of algorithms internally. I'd still think its possible to merge both in this crate, but for now I am going with 'merging happens in gix
. I am writing this in case you only put everything in topo
to have a clear slate and it's easier, and not because there is anything inherent to its operation that makes it unmergable.
With that said, if you want to integrate it in gix
, I think that's absolutely possible, but it would definitely be easier if the algorithm(s) would be integrated in what formerly was called Ancestors
. Both will work though, and I kind of like the separation as it makes it easier to implement new algorithms in future as well.
So I think this was a long-winded way of saying that gix
should abstract over it to have a single Sorting
parameter there that allows to implicitly choose any of the available implementations.
One issue there might be with that is that ends
are not universally implemented yet, but it's something that can probably be done, one way or another, particularly once specs
are implemented faithfully.
find: Find, | ||
predicate: Predicate, | ||
indegrees: IdMap<i32>, | ||
states: IdMap<topo::WalkFlags>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that states
and indegrees
could be merged? I am asking as these seem to be insert-only, and will use considerable amounts of memory on huge graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance I don't see why not. But how much memory are we talking about? I'd guess the memory used by IdMap<T1>
+ IdMap<T2>
wouldn't be that far off the amount used by IdMap<(T1, T2)>
.
They are indeed insert-only. And that's just because in Git they are 😛 There's a point where the indegree of a commit is set to zero. At that point I think it can be removed from all the maps, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the discussion a couple of months ago (damn, time flies) I did try to add my stuff to Ancestors
but couldn't get it to work well. I'm all for it, still, but I just don't see it. Doing it in gix
seems fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great to know (both :))!
This means if ever there is an issue with memory, there might be avenues to improve that. I assume Git also has multiple hashmaps for that? Or is it all in one there? * (if you recall, please ignore otherwise)*
This pull request adds a new module
commit::topo
togix-traverse
to enable topological commit walk. See this discussion for some context.Since that discussion was last active I have added a bunch of tests in the same spirit as existing tests. I have also taken the liberty of rearranging some of the code in
gix-traverse
to make it easier to "slot in" my code.One thing I have not implemented is exposing any of this to
gix
itself. I have started sketching it but am not happy with it currently.Any feedback is welcome!