Description
Current behavior 😯
#1465 fixed #1464 by checking if a git repository encountered during traversal is a worktree of the current repository and declining to delete it even if even if -p
is passed-r
is passed, and also even if -x
and -d
are passed to delete ignored directories. This protects the repository's own worktrees, even if nested, provided that they are encountered during traversal rather than being present below a directory that is found during traversal and determined to be ignored.
As described in #1464 (comment), gix clean
continues to require --skip-hidden-repositories
to avoid deleting nested repositories that require full traversal to discover, and this is important for performance. This behavior holds even when those nested repositories are worktrees of the current repository.
But worktrees of the current repository, regardless of their location, should never require much work to discover. This is because the repository knows where its worktrees are. For example, a non-bare repository with git worktree
managed worktrees, if it is not a submodule, has a .git
directory with a worktrees
subdirectory that contains information about its worktrees.
Whether a directory contains any descendants that are worktrees of the current repository should likewise be efficient to determine and likewise does not require full traversal.
Broad implementation idea
I suspect that either there is a simpler and more elegant approach, or maybe I am just expressing this idea poorly and it can be implemented simply? I am not sure. I present this more to clarify why I believe this can be achieved efficiently than to argue for a particular algorithm.
At the beginning of traversal, before actually beginning the walk:
- Examine the repository
.git/worktrees
directory or equivalent and make a list of its worktrees. - Remove any from the list that physically do not exist on disk. This requires checking if they exist.
- Normalize their paths relative to the root of the incipient traversal, i.e., so they are the same as they will be seen to be in traversal.
- Remove any from the list that are not inside the repository. Preferably, remove any that are not nested inside the working tree that
gix clean
is operating on. If feasible, remove any that are outside the part of it thatgix clean
is operating on, if that is not the whole thing. - Optionally, as a performance optimization (if such an optimization is justified), fall back to the old implementation if the list is empty. At this point, the list will usually be empty, because usually there are no nested worktrees.
- Store those paths, or maybe explicitly store each intermediate subdirectory path, in such a way that allows recursive deletion of a complete subdirectory that contains them to be avoided.
For elaboration on that last pre-step:
- This is conceptually a set, but I don't know what data structure should be used. Usually a repository has a small number of worktrees; making weird cases with numerous worktrees pretty fast would be good, but avoiding any more than a tiny slowdown with only a few worktrees is a must. With no worktrees, the slowdown of the whole algorithm compared to the current behavior should be negligible or zero.
- The situation with intermediate directories is conceptually similar, and should perhaps be handled similarly, to that of intermediate ignored directories under which there is a tracked file (such as by
git add -f
).
Then perform the walk and use the knowledge of where the worktrees are and which paths are ancestors of them to refrain from deleting an entire directory tree if any of the repository's own worktrees are anywhere inside it, while still deleting all other files that are eligible for deletion.
Complexity and connection to reusable library features
If this cannot be done in a reasonably simple way, then it might be judged not to be justified on the grounds that it is rarely needed.
However, whether or not that is the case, I wonder if this would become simple if expressed in terms of new directory walk features that would themselves have other applications.
Or maybe it is already simple and my unfamiliarity with the code involved--this is very much a part of the code that I am less familiar with--is what makes it seem complicated to me.
Expected behavior 🤔
Deleting nested worktrees of the same repository, no matter how deep under ignored directories, is something I do not expect to happen. I think the expected behavior is to preserve them. I have two reasons to think so.
1. On hiddenness - analogy to tracked files under ignored directories
As briefly alluded to above, even in the absence of multiple worktrees, one can have tracked files nested arbitrarily deep inside ignored directories by adding their .gitignore
entries later or by using git add -f
. These files are preserved by gix clean
, which has efficient access to knowledge of their existence because their locations, and thus knowledge of their ancestor directories, is present in the repository without requiring a full traversal.
Of course, this analogy breaks down in some ways, since there are Git tree objects representing those intermediate directories, while the repository's knowledge of its own git worktree
managed worktrees is, as far as I know, only ever through its knowledge of their paths. But tree objects don't exist for intermediate directories under which all tracked files are staged but not committed--and gix clean
seems to always do fine with this--so perhaps this analogy is not so bad even at a technical level?
Either way, I think the analogy holds in terms of what is intuitive to users, as well as what is being described informationally by the characterization of a directory as hidden.
Since gix clean
handles that, and gix clean
also seeks to avoid deleting any worktrees of the current repository, I think the intuitively expected behavior is to treat them the same way.
2. On performance - inferring expectation from design intent
Because the repository's worktrees are never hidden in the sense of requiring a dirwalk or other inherently slow operations to find them, users who are aware of the performance considerations related to --skip-hidden-repositories
may intuitively assume that the repository's own worktrees are protected even without it.
This assumption would hold even if the implementation details required to protect them efficiently turn out to be unworthwhile. Of course, that does not imply that implementing it must be worthwhile. Further documentation changes could instead be made to avoid giving the impression that they would be protected.
Maybe there are simpler alternatives?
All the foregoing text considers simplicity in terms of whether the behavior I am advocating for could be done in an acceptably simple way. But maybe there is a different improvement that could be made that is inherently simpler.
One idea: The message shown when -xdn
is passed without --skip-hidden-repositories
is itself clear with respect to separate repositories in ignored directories. Suppose the repository has other worktrees of its own (besides the one being operated in) whose parent directories appear to match any .gitignore
entry. Then that message could be augmented to also cover that. This could be by the addition of a separate trailing sentence or paragraph, or even just by adjusting its wording to say "repositories or worktrees" or something.
Another idea is to always have it contain that augmentation, but I think that might be more confusing, and less useful as a reminder that the situation might really apply.
Git behavior
I recently learned that git clean
accepts -f
twice in which case it does remove nested repositories. When passed with -d
and -x
, this removes ignored nested worktrees as well, both those that are directly visible inside a non-ignored parent directory, and those that are deeper down.
But when using only documented features, git clean
avoids deleting any nested repositories, even those that are unrelated to the current repository. In contrast, gix clean
offers this as a feature explicitly, and provides options to control it.
I think both of the above points are of low relevance to determining how gix
should behave in the specific case this issue covers, and that the git
behavior, to the extent that it is related, argues neither for nor against the proposal here.
Steps to reproduce 🕹
Instructions
I did the following on an Ubuntu 22.04 LTS system.
-
Build and installing
gitoxide
from the tip of themain
branch, making sure the build is from code that included the changes from #1465. -
Create a repository specifying an ignored path that will be used for a subdirectory:
git init has-deeply-nested-worktree cd has-deeply-nested-worktree echo subdir >.gitignore git add . git commit -m 'Initial commit'
-
Create a worktree in a directory inside that ignored directory, so that the worktree's root is a subdirectory of it:
mkdir subdir git worktree add subdir/mybranch
-
See what
git clean
would do with-x
and-d
, then observe that it does it:gix clean -xdn gix clean -xde git worktree list ls subdir
This shows that the worktree is gone, even though it is a worktree of the current repository rather than being an unrelated nested repository.
With output
Here's a transcript of the above, showing the output:
ek@Glub:~/src$ git init has-deeply-nested-worktree
Initialized empty Git repository in /home/ek/src/has-deeply-nested-worktree/.git/
ek@Glub:~/src$ cd has-deeply-nested-worktree
ek@Glub:~/src/has-deeply-nested-worktree (main #)$ echo subdir >.gitignore
ek@Glub:~/src/has-deeply-nested-worktree (main #%)$ git add .
ek@Glub:~/src/has-deeply-nested-worktree (main +)$ git commit -m 'Initial commit'
[main (root-commit) e53b380] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 .gitignore
ek@Glub:~/src/has-deeply-nested-worktree (main)$ mkdir subdir
ek@Glub:~/src/has-deeply-nested-worktree (main)$ git worktree add subdir/mybranch
Preparing worktree (new branch 'mybranch')
HEAD is now at e53b380 Initial commit
ek@Glub:~/src/has-deeply-nested-worktree (main)$ gix clean -xdn
WOULD remove subdir/ (🗑️)
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/has-deeply-nested-worktree (main)$ gix clean -xde
removing subdir/ (🗑️)
ek@Glub:~/src/has-deeply-nested-worktree (main)$ git worktree list
/home/ek/src/has-deeply-nested-worktree e53b380 [main]
/home/ek/src/has-deeply-nested-worktree/subdir/mybranch e53b380 [mybranch] prunable
ek@Glub:~/src/has-deeply-nested-worktree (main)$ ls subdir
ls: cannot access 'subdir': No such file or directory