Description
Current behavior 😯
gix clean -xde
will delete the entire top-level repository it is operating on, including tracked files and the .git
directory itself--thus the whole local history--if a .gitignore
file contains *
or /
. This seems to happen because the repository itself is identified as an untracked nested repository.
Illustration with a real-world example
One approach to writing .gitignore
is to list *
followed by !
exclusions. In such a repository, running gix clean -xde
deletes the entire contents of the repository directory, causing inconvenience and the loss of any unpushed data. For example, on the cargo-update
repository:
ek@Glub:~/src$ git clone https://github.com/nabijaczleweli/cargo-update.git
Cloning into 'cargo-update'...
remote: Enumerating objects: 328251, done.
remote: Counting objects: 100% (83271/83271), done.
remote: Compressing objects: 100% (1630/1630), done.
remote: Total 328251 (delta 81523), reused 83220 (delta 81477), pack-reused 244980
Receiving objects: 100% (328251/328251), 110.53 MiB | 30.62 MiB/s, done.
Resolving deltas: 100% (320324/320324), done.
ek@Glub:~/src$ cd cargo-update
ek@Glub:~/src/cargo-update (master=)$ cat .gitignore
*
!.gitignore
!.travis.yml
!gh_rsa.enc
!appveyor.yml
!LICENSE
!Cargo.toml
!rustfmt.toml
!build.rs
!cargo-install-update-manifest.rc
!cargo-install-update.exe.manifest
!*.sublime-project
!*.md
!.github
!.github/**
!src
!src/**
!man
!man/**
!tests
!tests/**
!test-data
!test-data/**
ek@Glub:~/src/cargo-update (master=)$ gix clean -xdn
WOULD remove / ( )
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/cargo-update (master=)$ gix clean -xde
removing / ( )
Error: Invalid argument (os error 22)
ek@Glub:~/src/cargo-update[1]$ ls -al
total 8
drwxr-xr-x 2 ek ek 4096 Jul 21 15:11 .
drwxr-xr-x 20 ek ek 4096 Jul 21 15:10 ..
That is on Ubuntu 22.04 LTS.
Windows is affected when outside the directory
Although Windows superficially appears unaffected because open files and directories cannot usually be deleted, the entire repository directory will still be deleted if one runs gix -r cargo-update clean -xde
from the parent directory. (This -r
is a gix
option and should not be confused with gix clean -r
.) So really all systems are affected, though to different degrees.
C:\Users\ek\src\cargo-update [master ≡]> gix clean -xde
removing / (🗑️)
Error: The process cannot access the file because it is being used by another process. (os error 32)
C:\Users\ek\src\cargo-update [master ≡]> cd ..
C:\Users\ek\src> gix -r cargo-update clean -xdn
WOULD remove / (🗑️)
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
C:\Users\ek\src> gix -r cargo-update clean -xde
removing / (🗑️)
C:\Users\ek\src> cd cargo-update
Set-Location: Cannot find path 'C:\Users\ek\src\cargo-update' because it does not exist.
The /
means the repository
The /
is in each case referring to the top-level directory of the repository. It fortunately does not refer to the actual root of the filesystem. Likewise:
- As far as I have been able to find, this does not ever delete upwards. For example, paths with
..
components in.gitignore
do not cause files outside of the repository to be deleted. - It also does not seem possible to have
.gitignore
patterns that cause only some of the contents of.git
to be deleted. Although this may seem like cold comfort, it is actually a major mitigating factor, because partial deletion of.git
would not necessarily be noticed and could result in situations that could be much harder to recover from, since local refs could be silently removed, recreated with different referents, and then force-pushed to a remote without awareness that they were replacing preexisting conceptually unrelated tags or branches.
However, all local branches and their history (except the tip of any branches checked out in separate worktrees and unmodified), the index, any stashes, and other objects available through the reflog, can all be deleted.
Some relevant code
It looks like, in traversals, paths that are not conceptually part of the repository are pruned, except that when a .git
directory would be pruned for this reason but also matches a .gitignore
entry, then it is instead retained and given the Ignored
status:
But this is not necessarily to say that this aspect of the classification has to be changed. It may be suitable for most repositories found during traversal, just not for the top-level working tree or .git
directory, nor for any submodules or their .git
files.
Expected behavior 🤔
One never expects cleaning to remove tracked files or cause data loss in the repository's local history, much less loss of the whole history.
Although the behavior described here was not intended, and exposition may not be required to demonstrate that this is a bug, I've nonetheless detailed what I think is the expected behavior below. This is because I think it may be useful identifying current or future cases of the bug, some of which are less obvious than others.
A *
entry in a .gitignore
has real-world uses (as shown in the example of the cargo-update
repository) and should be taken as a pattern that matches all files, which subsequent !
exclusions can then make exceptions to. A /
might likewise be used deliberately, though I'm not sure I have seen it outside of testing. Furthermore, the ability to delete nested untracked repositories is a deliberate feature of gix clean
when passed some combinations of options, and a very useful feature at that. However:
- The top-level directory of a repository, i.e., the entire repository, should not be deleted in a clean. This applies to the case of running
gix -r ... clean -xde
. This would apply even if the directory were actually empty due to a nonstandard worktree. (This-r
is agix
option and should not be confused with-r
as agix clean
option as presented below in case 4.) - Tracked files should not be deleted or modified in a clean, regardless of whether they have any (staged or unstaged) changes. This applies to the unexpected deletions other than those of
.git
. - Neither the current repository's
.git
directory nor anything inside it should ever be deleted in a clean, even if intentionally cleaning ignored files and even if intentionally including ignored subdirectories that are themselves repositories. Although the current repository's.git
directory is "ignored" in the sense that commands likegit add .
do not stage anything from it, this behavior is separate from, and supersedes, the effect of.gitignore
. The.git
directory is effectively a void in the working tree. - As a special case of 3, even if
.git
is specified explicitly in.gitignore
and-r
is included in the options passed togix clean
, the.git
directory should not be deleted. Currently adding-r
will make this happen, covered in "Steps to reproduce" below. (This-r
is agix clean
option and should not be confused with-r
as agix
option as presented above in case 1.) - When the current repository is a submodule, is is expected to have a
.git
file instead of a.git
directory. This likewise should not be deleted when runninggix clean
in the submodule, irrespective of the options togix clean
or the contents of any.gitignore
file. I have not tested this case yet. - When the current repository has submodules--that is, directories corresponding to entries in
.gitmodules
--those submodules should not be deleted bygix clean
, since submodule directories are likewise voids in the superproject's working tree, rather than being ignored due to.gitignore
. I have not tested this case yet either.
Cases 2 and 3 are the most important, because they are the most common and they are the most likely to cause data loss, especially case 3.
Although I have not tested cases 5 and 6 (yet), I mention them because it seems like incorrect behavior for them might be easy to bring about by accident when fixing this bug for the other cases.
Notes on -p
and -r
One possible implementation approach comes to mind for a fix that I want to recommend against. While gix clean
recognizes precious files, I recommend against allowing any of the above to occur even when -p
is passed. I think regarding .git
as typically ineligible for deletion by automatically considering it a precious directory would still be far from strong enough protection. It also doesn't really fit conceptually: I believe that precious files are conceptually those that should usually not be deleted due to their status as being important for reasons independent of source control.
In addition to the above, the effect of -x
and -d
on actual nested repositories, especially if they are to continue to delete them under some conditions even in the absence of -r
, should be documented explicitly, including in the output of gix help clean
. But that could be done separately from the fix for this bug.
Git behavior
No one-to-one comparison...
There is no exact comparison to git clean
behavior, because gix clean
deliberately deletes ignored nested repositories when -x
and -d
are passed (provided that -e
is passed to allow it do anything at all). It furthermore seems to do so intentionally even without -r
, though as examined below in Steps to reproduce, perhaps this is unintentional in the absence of -r
.
More broadly, gix clean
is not intended to behave exactly the same as git clean
, as detailed in #1308.
...but git
behavior is relevant
However, it's true that git clean
does set strong expectations for what kinds of deletions are within the ambit of cleaning, and no way of using git clean
produces this effect.
For example, running git clean -xdf
in the cargo-update
repository used as an example above does not delete any tracked files or the .git
directory.
Steps to reproduce 🕹
To reproduce this, one can carry out the example shown above with the non-toy cargo-update
repository, which confirms the practical significance.
Simplified reproducer
One can alternatively run the following commands to reproduce it with a simple repository. These and all commands shown in this section were tested in Ubuntu 22.04 LTS.
git init ignore-star
cd ignore-star
echo '*' >.gitignore
echo '!.gitignore' >>.gitignore
cat .gitignore
git add .
git status
gix clean -xdn
gix clean -xde
ls -al
With full output:
ek@Glub:~/src$ git init ignore-star
Initialized empty Git repository in /home/ek/src/ignore-star/.git/
ek@Glub:~/src$ cd ignore-star
ek@Glub:~/src/ignore-star (main #)$ echo '*' >.gitignore
ek@Glub:~/src/ignore-star (main #)$ echo '!.gitignore' >>.gitignore
ek@Glub:~/src/ignore-star (main #%)$ cat .gitignore
*
!.gitignore
ek@Glub:~/src/ignore-star (main #%)$ git add .
ek@Glub:~/src/ignore-star (main +)$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitignore
ek@Glub:~/src/ignore-star (main +)$ gix clean -xdn
WOULD remove / (🗑️)
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/ignore-star (main +)$ gix clean -xde
removing / (🗑️)
Error: Invalid argument (os error 22)
ek@Glub:~/src/ignore-star[1]$ ls -al
total 8
drwxr-xr-x 2 ek ek 4096 Jul 21 19:14 .
drwxr-xr-x 20 ek ek 4096 Jul 21 19:14 ..
Demonstration that this relates to nested repository handling
Some of the above commands are not necessary to confirm the bug but illustrate relevant aspects of it. For example, consider this part of the output of the dry-run gix clean -xdn
command run before doing the real clean:
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
This strongly suggests that the problem relates to the way code in gix::dirwalk
identifies nested repositories.
Variation with /
The above commands can be repeated with a /
entry instead of *
to confirm that it also happens with that. This works both with and without the second line, !.gitignore
, since git
itself treats a /
entry in .gitignore
not to cover .gitignore
. Actually I am not sure what a /
in .gitignore
is supposed to do.
Variation with tracked files that are not special
Although the real-world example with cargo-update
, as well as the minimal example where .gitignore
lists *
and !.gitingore
, illustrate that files excluded from being ignored by matching a !
pattern that comes after *
are not spared, here's a minimal example focused on that:
git init ignore-tracked
cd ignore-tracked
echo $'*\n!.gitignore\n!a' >.gitignore
cat .gitignore
touch a
git add .
git status
gix clean -xdn
gix clean -xde
ls -al
That produces:
ek@Glub:~/src$ git init ignore-tracked
Initialized empty Git repository in /home/ek/src/ignore-tracked/.git/
ek@Glub:~/src$ cd ignore-tracked
ek@Glub:~/src/ignore-tracked (main #)$ echo $'*\n!.gitignore\n!a' >.gitignore
ek@Glub:~/src/ignore-tracked (main #%)$ cat .gitignore
*
!.gitignore
!a
ek@Glub:~/src/ignore-tracked (main #%)$ touch a
ek@Glub:~/src/ignore-tracked (main #%)$ git add .
ek@Glub:~/src/ignore-tracked (main +)$ git status
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: .gitignore
new file: a
ek@Glub:~/src/ignore-tracked (main +)$ gix clean -xdn
WOULD remove / (🗑️)
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
ek@Glub:~/src/ignore-tracked (main +)$ gix clean -xde
removing / (🗑️)
Error: Invalid argument (os error 22)
ek@Glub:~/src/ignore-tracked[1]$ ls -al
total 8
drwxr-xr-x 2 ek ek 4096 Jul 21 22:05 .
drwxr-xr-x 20 ek ek 4096 Jul 21 22:04 ..
The key seems to be that the top-level directory is taken to match the entry *
, causing its contents all to be deleted even if some of them match !
exclusions.
Variation listing .git
and passing -r
Making the .gitignore
file contain .git
does not cause gix clean -xde
to delete .git
, but it does cause gix clean -xdre
to delete .git
. (Note that this -r
is an option to gix clean
and should not be confused with the -r
option to gix
before a subcommand, which specifies a repository to operate on.) This can be seen by running the commands:
git init ignore-dotgit
cd ignore-dotgit
echo '.git' >.gitignore
cat .gitignore
git add .
gix clean -xdn
gix clean -xde
gix clean -xdrn
gix clean -xdre
ls -al
Here's what that looks like:
ek@Glub:~/src$ git init ignore-dotgit
Initialized empty Git repository in /home/ek/src/ignore-dotgit/.git/
ek@Glub:~/src$ cd ignore-dotgit
ek@Glub:~/src/ignore-dotgit (main #)$ echo '.git' >.gitignore
ek@Glub:~/src/ignore-dotgit (main #%)$ cat .gitignore
.git
ek@Glub:~/src/ignore-dotgit (main #%)$ git add .
ek@Glub:~/src/ignore-dotgit (main +)$ gix clean -xdn
Nothing to clean (Skipped 1 repository - show with -r)
ek@Glub:~/src/ignore-dotgit (main +)$ gix clean -xde
ek@Glub:~/src/ignore-dotgit (main +)$ gix clean -xdrn
WOULD remove repository .git/ (🗑️)
ek@Glub:~/src/ignore-dotgit (main +)$ gix clean -xdre
removing repository .git/ (🗑️)
ek@Glub:~/src/ignore-dotgit$ ls -al
total 12
drwxr-xr-x 2 ek ek 4096 Jul 21 19:51 .
drwxr-xr-x 20 ek ek 4096 Jul 21 19:50 ..
-rw-r--r-- 1 ek ek 5 Jul 21 19:50 .gitignore
When should -r
affect what happens?
I believe this should not happen even with -r
.
In addition, combined with the above results, this raises the question of whether gix clean -xde
without -r
is actually intended to delete any actually nested repositories:
- If so, then, as noted above, I think this should be documented.
- If not, then that this happens--that, for example,
gix clean -xde
without-r
is currently a good command to delete both build output and generated archives when run ingitoxide
's own repository--might be considered part of this bug, or its own separate related bug.
Other variations with .git
and -r
Listing /.git
has the same effect as .git
at least when the current directory is the top-level directory of the repository.
Listing paths under .git
in .gitignore
, such as with the line .git/config
or /.git/config
, fortunately has no effect. It does not appear that a .gitignore
entry can cause gix clean
with any combination of options to attempt to delete only some files inside .git
.