Skip to content

Add topological walk #1336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 7, 2024
Merged

Add topological walk #1336

merged 18 commits into from
Apr 7, 2024

Conversation

Osse
Copy link
Contributor

@Osse Osse commented Apr 3, 2024

This pull request adds a new module commit::topo to gix-traverse to enable topological commit walk. See this discussion for some context.

Since that discussion was last active I have added a bunch of tests in the same spirit as existing tests. I have also taken the liberty of rearranging some of the code in gix-traverse to make it easier to "slot in" my code.

One thing I have not implemented is exposing any of this to gix itself. I have started sketching it but am not happy with it currently.

Any feedback is welcome!

Byron added 7 commits April 3, 2024 22:08
This way, it's much clearer what possible values are.
For performance, it also adds `Entry::stage_raw()` to avoid having
to match on a number.
Even inactive submodules are shown in the status by `git status`,
so `gix` should do the same.

First observed in helix-editor/helix#5645 (comment)
… Vec.

This makes the API more consistent, and one can pass `None`
as well.
It's interesting to note that one of these warnings constitute
reading the fields, but these come about by displaying a
debug version of these fields, which now doesn't qualify anymore.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a million for contributing true topological traversal!

I only had a quick look this time, less than 15 minutes, and really like what I see! There is a few minor comments, only, and the lint CI failure is something that would certainly also be going away with my refactor. However, please feel free to fix it earlier as I won't get to work on this immediately.

Byron and others added 4 commits April 7, 2024 09:23
In preparation of a new module for topological commit walk. The idea is
that the current walk will be the "default" hence the 'pub use'.
Makes it easier to use these types for different kinds of commit
walking. No matter how the history graph is traversed some concepts
remain identical.
This commit introduces a new module `topo` in gix-traverse that contains
an iterator for walking commits topologically. It is very much based on
the existing Git implementation introduced in b454241[1] as well as
improvements done since then. Function names are kept as much as
possible. The mailing list[2] has a lot of interesting discussion about
how the algorithm works. In short it is a three stage process that aims
to reduce the amount of traversal needed before the first result can be
output.

Hopefully this makes it easier to use gitoxide to implement other tools
that need topological sorting, e.g. graphical log and blame.

For tests there is a separate type named TraversalAssertion. There is
probably grounds for making code more common in this area.

[1]: git/git@b454241
[2]: https://public-inbox.org/git/[email protected]/

Add tests for gix_traverse::topo

This unfortunately duplicates types found in the existing tests.
@Byron
Copy link
Member

Byron commented Apr 7, 2024

Review Tasks

  • non-recursive next()
  • add baseline comparison to assure it's the same as Git topo order
  • unify module structure even if it would be breaking
  • clear up topo module structure and avoid double-duty Error - let's keep it simple, otherwise callers have to deal with to partially overlapping errors.

pub enum Sorting {
/// Show no parents before all of its children are shown, but otherwise show
/// commits in the commit timestamp order.
#[default]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick question: Since this is all about topo, should the default not be TopoOrder?

I also have trouble understanding how this DateOrder maps to Git. Is there a git rev-list equivalent?

Copy link
Contributor Author

@Osse Osse Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have Git's code (with my debug modifications) in front of me. I can fill in details later, but from memory:

In Git there is an older sorting algorithm that that's used if gen numbers aren't available. If neither --date-order nor --topo-order is specified then that algorithm is used, for some reason. So in Git the default is "neither of them". (this is the part I must double-check). I think the idea is that with time the old algorithm can be removed from Git entirely, but it hasn't happened yet.

DateOrder maps to git rev-list --date-order. I have no strong feelings about which is the default for us; I imagine most callers would want to set the order explicitly, anyway. Git's default is equivalent to the new algorithm with --date-order (again, I need to double check this) which is why I picked DateOrder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thanks so much for the explanation! The level of detail is perfectly fine, and I just added b8486f7 to add another test and incorporate the information.

It's amazing that Git still has this older algorithm, but then added more just by means of flags which don't seem to be mutually exclusive even - git rev-list --topo-order --date-order @ does something.

Byron added 5 commits April 7, 2024 20:14
We remove ability to reuse `State` as ultimately,this capability
wasn't all that useful.
- adjust module structure
- remove unused method with `todo!()`
- no recursion in `next()`
- baseline comparisons
…nt with other plumbing crates.

This makes sure that each time is reached by only a single path.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this massive PR, I think it's great!

After some failed attempts to unify the package structure, I ended up with something that feels quite 'plumbing', but which at least is consistent with the current structure which sees verything as traversal (and not as Walk). I also considered to put both traversals/walks into gix-revwalk, and into gix-revision as well, but felt like gix-traverse is most 'meant' for these implementations.

I have also looks at the gix::Repository::rev_walk() implementation and think that it should be possible to use either one or the other implementation, based on a new Sorting enum that merges both topo::Sorting and simple::Sorting (previously ancestors::Sorting.

But doing so is left for another PR, and if you want, you can give the implementation a try.

@Byron Byron merged commit b590a9d into GitoxideLabs:main Apr 7, 2024
@Osse
Copy link
Contributor Author

Osse commented Apr 8, 2024

Did you call it simple based on my doc comment "Simple ancestors traversal"? I didn't mean to be rude 😛

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, indeed I may be been influenced by this, and the Ancestors iterator looked a lot simpler after seeing the state handling in the new topological one 😁.

Of course, both have their place and I'd use Simple when memory consumption must be kept in check and there is no particular need for more advanced ordering.

In a way, Ancestors was planned to be the 'one' iterator which handles a bunch of algorithms internally. I'd still think its possible to merge both in this crate, but for now I am going with 'merging happens in gix. I am writing this in case you only put everything in topo to have a clear slate and it's easier, and not because there is anything inherent to its operation that makes it unmergable.

With that said, if you want to integrate it in gix, I think that's absolutely possible, but it would definitely be easier if the algorithm(s) would be integrated in what formerly was called Ancestors. Both will work though, and I kind of like the separation as it makes it easier to implement new algorithms in future as well.

So I think this was a long-winded way of saying that gix should abstract over it to have a single Sorting parameter there that allows to implicitly choose any of the available implementations.

One issue there might be with that is that ends are not universally implemented yet, but it's something that can probably be done, one way or another, particularly once specs are implemented faithfully.

find: Find,
predicate: Predicate,
indegrees: IdMap<i32>,
states: IdMap<topo::WalkFlags>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that states and indegrees could be merged? I am asking as these seem to be insert-only, and will use considerable amounts of memory on huge graphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance I don't see why not. But how much memory are we talking about? I'd guess the memory used by IdMap<T1> + IdMap<T2> wouldn't be that far off the amount used by IdMap<(T1, T2)>.

They are indeed insert-only. And that's just because in Git they are 😛 There's a point where the indegree of a commit is set to zero. At that point I think it can be removed from all the maps, but I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the discussion a couple of months ago (damn, time flies) I did try to add my stuff to Ancestors but couldn't get it to work well. I'm all for it, still, but I just don't see it. Doing it in gix seems fine to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know (both :))!

This means if ever there is an issue with memory, there might be avenues to improve that. I assume Git also has multiple hashmaps for that? Or is it all in one there? * (if you recall, please ignore otherwise)*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants