Skip to content

gix clean -xde deletes the repo's own hidden nested worktrees, but they are not really hidden #1469

Closed
@EliahKagan

Description

@EliahKagan

Current behavior 😯

#1465 fixed #1464 by checking if a git repository encountered during traversal is a worktree of the current repository and declining to delete it even if -p is passed even if -r is passed, and also even if -x and -d are passed to delete ignored directories. This protects the repository's own worktrees, even if nested, provided that they are encountered during traversal rather than being present below a directory that is found during traversal and determined to be ignored.

As described in #1464 (comment), gix clean continues to require --skip-hidden-repositories to avoid deleting nested repositories that require full traversal to discover, and this is important for performance. This behavior holds even when those nested repositories are worktrees of the current repository.

But worktrees of the current repository, regardless of their location, should never require much work to discover. This is because the repository knows where its worktrees are. For example, a non-bare repository with git worktree managed worktrees, if it is not a submodule, has a .git directory with a worktrees subdirectory that contains information about its worktrees.

Whether a directory contains any descendants that are worktrees of the current repository should likewise be efficient to determine and likewise does not require full traversal.

Broad implementation idea

I suspect that either there is a simpler and more elegant approach, or maybe I am just expressing this idea poorly and it can be implemented simply? I am not sure. I present this more to clarify why I believe this can be achieved efficiently than to argue for a particular algorithm.

At the beginning of traversal, before actually beginning the walk:

  1. Examine the repository .git/worktrees directory or equivalent and make a list of its worktrees.
  2. Remove any from the list that physically do not exist on disk. This requires checking if they exist.
  3. Normalize their paths relative to the root of the incipient traversal, i.e., so they are the same as they will be seen to be in traversal.
  4. Remove any from the list that are not inside the repository. Preferably, remove any that are not nested inside the working tree that gix clean is operating on. If feasible, remove any that are outside the part of it that gix clean is operating on, if that is not the whole thing.
  5. Optionally, as a performance optimization (if such an optimization is justified), fall back to the old implementation if the list is empty. At this point, the list will usually be empty, because usually there are no nested worktrees.
  6. Store those paths, or maybe explicitly store each intermediate subdirectory path, in such a way that allows recursive deletion of a complete subdirectory that contains them to be avoided.

For elaboration on that last pre-step:

  • This is conceptually a set, but I don't know what data structure should be used. Usually a repository has a small number of worktrees; making weird cases with numerous worktrees pretty fast would be good, but avoiding any more than a tiny slowdown with only a few worktrees is a must. With no worktrees, the slowdown of the whole algorithm compared to the current behavior should be negligible or zero.
  • The situation with intermediate directories is conceptually similar, and should perhaps be handled similarly, to that of intermediate ignored directories under which there is a tracked file (such as by git add -f).

Then perform the walk and use the knowledge of where the worktrees are and which paths are ancestors of them to refrain from deleting an entire directory tree if any of the repository's own worktrees are anywhere inside it, while still deleting all other files that are eligible for deletion.

Complexity and connection to reusable library features

If this cannot be done in a reasonably simple way, then it might be judged not to be justified on the grounds that it is rarely needed.

However, whether or not that is the case, I wonder if this would become simple if expressed in terms of new directory walk features that would themselves have other applications.

Or maybe it is already simple and my unfamiliarity with the code involved--this is very much a part of the code that I am less familiar with--is what makes it seem complicated to me.

Expected behavior 🤔

Deleting nested worktrees of the same repository, no matter how deep under ignored directories, is something I do not expect to happen. I think the expected behavior is to preserve them. I have two reasons to think so.

1. On hiddenness - analogy to tracked files under ignored directories

As briefly alluded to above, even in the absence of multiple worktrees, one can have tracked files nested arbitrarily deep inside ignored directories by adding their .gitignore entries later or by using git add -f. These files are preserved by gix clean, which has efficient access to knowledge of their existence because their locations, and thus knowledge of their ancestor directories, is present in the repository without requiring a full traversal.

Of course, this analogy breaks down in some ways, since there are Git tree objects representing those intermediate directories, while the repository's knowledge of its own git worktree managed worktrees is, as far as I know, only ever through its knowledge of their paths. But tree objects don't exist for intermediate directories under which all tracked files are staged but not committed--and gix clean seems to always do fine with this--so perhaps this analogy is not so bad even at a technical level?

Either way, I think the analogy holds in terms of what is intuitive to users, as well as what is being described informationally by the characterization of a directory as hidden.

Since gix clean handles that, and gix clean also seeks to avoid deleting any worktrees of the current repository, I think the intuitively expected behavior is to treat them the same way.

2. On performance - inferring expectation from design intent

Because the repository's worktrees are never hidden in the sense of requiring a dirwalk or other inherently slow operations to find them, users who are aware of the performance considerations related to --skip-hidden-repositories may intuitively assume that the repository's own worktrees are protected even without it.

This assumption would hold even if the implementation details required to protect them efficiently turn out to be unworthwhile. Of course, that does not imply that implementing it must be worthwhile. Further documentation changes could instead be made to avoid giving the impression that they would be protected.

Maybe there are simpler alternatives?

All the foregoing text considers simplicity in terms of whether the behavior I am advocating for could be done in an acceptably simple way. But maybe there is a different improvement that could be made that is inherently simpler.

One idea: The message shown when -xdn is passed without --skip-hidden-repositories is itself clear with respect to separate repositories in ignored directories. Suppose the repository has other worktrees of its own (besides the one being operated in) whose parent directories appear to match any .gitignore entry. Then that message could be augmented to also cover that. This could be by the addition of a separate trailing sentence or paragraph, or even just by adjusting its wording to say "repositories or worktrees" or something.

Another idea is to always have it contain that augmentation, but I think that might be more confusing, and less useful as a reminder that the situation might really apply.

Git behavior

I recently learned that git clean accepts -f twice in which case it does remove nested repositories. When passed with -d and -x, this removes ignored nested worktrees as well, both those that are directly visible inside a non-ignored parent directory, and those that are deeper down.

But when using only documented features, git clean avoids deleting any nested repositories, even those that are unrelated to the current repository. In contrast, gix clean offers this as a feature explicitly, and provides options to control it.

I think both of the above points are of low relevance to determining how gix should behave in the specific case this issue covers, and that the git behavior, to the extent that it is related, argues neither for nor against the proposal here.

Steps to reproduce 🕹

Instructions

I did the following on an Ubuntu 22.04 LTS system.

  1. Build and installing gitoxide from the tip of the main branch, making sure the build is from code that included the changes from #1465.

  2. Create a repository specifying an ignored path that will be used for a subdirectory:

    git init has-deeply-nested-worktree
    cd has-deeply-nested-worktree
    echo subdir >.gitignore
    git add .
    git commit -m 'Initial commit'
  3. Create a worktree in a directory inside that ignored directory, so that the worktree's root is a subdirectory of it:

    mkdir subdir
    git worktree add subdir/mybranch
  4. See what git clean would do with -x and -d, then observe that it does it:

    gix clean -xdn
    gix clean -xde
    git worktree list
    ls subdir

This shows that the worktree is gone, even though it is a worktree of the current repository rather than being an unrelated nested repository.

With output

Here's a transcript of the above, showing the output:

ek@Glub:~/src$ git init has-deeply-nested-worktree
Initialized empty Git repository in /home/ek/src/has-deeply-nested-worktree/.git/
ek@Glub:~/src$ cd has-deeply-nested-worktree
ek@Glub:~/src/has-deeply-nested-worktree (main #)$ echo subdir >.gitignore
ek@Glub:~/src/has-deeply-nested-worktree (main #%)$ git add .
ek@Glub:~/src/has-deeply-nested-worktree (main +)$ git commit -m 'Initial commit'
[main (root-commit) e53b380] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore
ek@Glub:~/src/has-deeply-nested-worktree (main)$ mkdir subdir
ek@Glub:~/src/has-deeply-nested-worktree (main)$ git worktree add subdir/mybranch
Preparing worktree (new branch 'mybranch')
HEAD is now at e53b380 Initial commit
ek@Glub:~/src/has-deeply-nested-worktree (main)$ gix clean -xdn
WOULD remove subdir/ (🗑️)

WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/has-deeply-nested-worktree (main)$ gix clean -xde
removing subdir/ (🗑️)
ek@Glub:~/src/has-deeply-nested-worktree (main)$ git worktree list
/home/ek/src/has-deeply-nested-worktree                  e53b380 [main]
/home/ek/src/has-deeply-nested-worktree/subdir/mybranch  e53b380 [mybranch] prunable
ek@Glub:~/src/has-deeply-nested-worktree (main)$ ls subdir
ls: cannot access 'subdir': No such file or directory

Metadata

Metadata

Assignees

No one assigned

    Labels

    acknowledgedan issue is accepted as shortcoming to be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions