Description
Current behavior 😯
Overview
The generated archive gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
is always regenerated on non-Windows platforms--or at least on Ubuntu and macOS--when any of the following tests in gix-diff/tests/tree/mod.rs
is run, as confirmed by running them individually and then running git status
:
many_different_states
maximal_difference
interesting_rename
interesting_rename_2
Those tests use the changes::to_obtain_tree::db
function, which uses that fixture. But they are not the only tests in the file that call that function. These tests do not produce a change to any tracked file, and therefore either do not regenerate gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
, or regenerate it with the exact same contents:
many_different_states_nested
maximal_difference_nested
I have verified that this happens, and that the above description holds, both on Ubuntu 22.04 LTS and on macOS.
History
More broadly, this is a long-standing situation, though most past information is less specific (I did not previously investigate which tests produced the problem). As described in a4bb7b9 (#1435), it precedes 55c635a (#1425) and dcab79a (#1415). My attempt to fix it in a4bb7b9 (#1435) was also unsuccessful both for Ubuntu and macOS (that commit updated other generated archives, but not this one, even though it tried to).
After that, you tried to update it in e6c8cd6, as described in #1435 (review). When I found this was also not successful, I decided to investigate further, and this issue is the outcome of that research.
The cause
I think I know the cause, but it seems odd that this seems not to have happened anywhere else in the project, and also I am not sure what the best approach to fixing it is. The four non-"nested" tests cause the fixture script to be called with different command-line arguments from the two "nested" tests. I believe that, whichever happens to run the fixture last--which is in general nondeterministic since tests are usually run in parallel--saves its archive.
To show the distinction in the arguments that are used, first consider the implementation of the db
function:
That passes command-line arguments represented by its args
argument to the fixture script. The four non-"nested" test cases call db
with an args
argument of None
:
In contrast, the two "nested" test cases call db
with an args
argument of ["a"].iter().copied()
:
I suspect that this is a race condition and that it may cause test failures on occasion. This suspicion is based on the general situation, taken together with my recollection of having observed occasional failures of tests in this module. But this seems rare, and unfortunately I did not record the details of those failures. It is possible that they are not related to this issue; it is also possible that I am misremembering and that the failures I observed were from tests in a different module.
Expected behavior 🤔
The committed archives should be reusable, such that running any combination of tests without setting GIX_TEST_IGNORE_ARCHIVES
will use the existing archives unless a change has been made to one of the scripts that generates them since the last time regenerated archives were committed.
I think this could be fixed by modifying the script so that the test repositories whose object databases are being used are separate, or splitting it into multiple scripts, or adjusting how script and archive identities are computed and compared so that multiple archives are automatically kept in this situation, or adding the generated archive to .gitignore
if saving a reusable archive is infeasible. However, the last option, in addition to possibly forgoing the performance benefit of using pre-generated archives, may not fully fix this if a race condition is part of its effect.
But I also think I am probably missing some insights into the problem and how it should be fixed, because this doesn't seem like the only area in the test suite where test fixture scripts are called in different tests with different combinations of arguments.
Git behavior
Not applicable. I don't think this corresponds closely enough to the way Git's test suite works to make a comparison.
Steps to reproduce 🕹
I don't think this depends on how the individual tests are run. To run the individual tests, I ran them in VS Code (on both the Ubuntu and macOS test systems--I don't have desktop access to macOS, but I was able to use vscode-server and SSH in). This uses cargo test
behind the scenes.
I can report the specific commands that the rust-analyzer
extension uses to run them of needed, and/or find an alternative approach to confirm the results. Please let me know if more information would be helpful.
What I think may be more useful to report are the specific commands I ran manually to clear things out between running the individual test cases:
git restore .
gix clean -xd -m '*generated*' -e
Although I often run gix clean -xde
to get a fully clean environment, using the above command is the same as you used in #1435 (review) and also is a far better choice for investigating this kind of problem, because it avoids unnecessarily forcing recompliation. Although building again is sometimes useful in testing, such as in #1361, it should not be needed here.