Skip to content

Commit b5062f3

Browse files
authored
feat(fs): use git commit hash as cache key for clean repositories (#8278)
Signed-off-by: knqyf263 <[email protected]>
1 parent aec8885 commit b5062f3

File tree

18 files changed

+345
-98
lines changed

18 files changed

+345
-98
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ thumbs.db
2626
coverage.txt
2727
integration/testdata/fixtures/images
2828
integration/testdata/fixtures/vm-images
29+
internal/gittest/testdata/test-repo
2930

3031
# SBOMs generated during CI
3132
/bom.json

docs/docs/configuration/cache.md

+2-4
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,7 @@ It supports three types of backends for this cache:
5151
- TTL can be configured via `--cache-ttl`
5252

5353
### Local File System
54-
The local file system backend is the default choice for container and VM image scans.
55-
When scanning container images, it stores analysis results on a per-layer basis, using layer IDs as keys.
56-
This approach enables faster scans of the same container image or different images that share layers.
54+
The local file system backend is the default choice for container image, VM image and repository scans.
5755

5856
!!! note
5957
Internally, this backend uses [BoltDB][boltdb], which has an important limitation: only one process can access the cache at a time.
@@ -63,7 +61,7 @@ This approach enables faster scans of the same container image or different imag
6361
### Memory
6462
The memory backend stores analysis results in memory, which means the cache is discarded when the process ends.
6563
This makes it useful in scenarios where caching is not required or desired.
66-
It serves as the default for repository, filesystem and SBOM scans and can also be employed for container image scans when caching is unnecessary.
64+
It serves as the default for filesystem and SBOM scans and can also be employed for container image scans when caching is unnecessary.
6765

6866
To use the memory backend for a container image scan, you can use the following command:
6967

docs/docs/references/configuration/cli/trivy_repository.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ trivy repository [flags] (REPO_PATH | REPO_URL)
1919

2020
```
2121
--branch string pass the branch name to be scanned
22-
--cache-backend string [EXPERIMENTAL] cache backend (e.g. redis://localhost:6379) (default "memory")
22+
--cache-backend string [EXPERIMENTAL] cache backend (e.g. redis://localhost:6379) (default "fs")
2323
--cache-ttl duration cache TTL when using redis as cache backend
2424
--cf-params strings specify paths to override the CloudFormation parameters files
2525
--check-namespaces strings Rego namespaces

docs/docs/target/container_image.md

+6
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,12 @@ trivy image --compliance docker-cis-1.6.0 [YOUR_IMAGE_NAME]
463463
## Authentication
464464
Please reference [this page](../advanced/private-registries/index.md).
465465

466+
## Scan Cache
467+
When scanning container images, it stores analysis results in the cache, using the image ID and the layer IDs as the key.
468+
This approach enables faster scans of the same container image or different images that share layers.
469+
470+
More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).
471+
466472
## Options
467473
### Scan Image on a specific Architecture and OS
468474
By default, Trivy loads an image on a "linux/amd64" machine.

docs/docs/target/filesystem.md

+10
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,13 @@ $ trivy fs --scanners license /path/to/project
9191
## SBOM generation
9292
Trivy can generate SBOM for local projects.
9393
See [here](../supply-chain/sbom.md) for the detail.
94+
95+
## Scan Cache
96+
When scanning local projects, it doesn't use the cache by default.
97+
However, when the local project is a git repository with clean status and the cache backend other than the memory one is enabled, it stores analysis results, using the latest commit hash as the key.
98+
99+
```shell
100+
$ trivy fs --cache-backend fs /path/to/git/repo
101+
```
102+
103+
More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).

docs/docs/target/repository.md

+6
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,12 @@ $ trivy repo --scanners license (REPO_PATH | REPO_URL)
109109
Trivy can generate SBOM for code repositories.
110110
See [here](../supply-chain/sbom.md) for the detail.
111111

112+
## Scan Cache
113+
When scanning git repositories, it stores analysis results in the cache, using the latest commit hash as the key.
114+
Note that the cache is not used when the repository is dirty, otherwise Trivy will miss the files that are not committed.
115+
116+
More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).
117+
112118
## References
113119
The following flags and environmental variables are available for remote git repositories.
114120

docs/docs/target/vm.md

+8
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,14 @@ $ trivy vm --scanners license [YOUR_VM_IMAGE]
182182
Trivy can generate SBOM for VM images.
183183
See [here](../supply-chain/sbom.md) for the detail.
184184

185+
## Scan Cache
186+
When scanning AMI or EBS snapshots, it stores analysis results in the cache, using the snapshot ID.
187+
Scanning the same snapshot several times skips analysis if the cache is already available.
188+
189+
When scanning local files, it doesn't use the cache by default.
190+
191+
More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).
192+
185193
## Supported Architectures
186194

187195
### Virtual machine images

internal/gittest/server.go

+47
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ package gittest
55
import (
66
"errors"
77
"net/http/httptest"
8+
"os"
89
"path/filepath"
10+
"runtime"
911
"testing"
1012
"time"
1113

@@ -59,6 +61,51 @@ func NewServer(t *testing.T, repo, dir string) *httptest.Server {
5961
return httptest.NewServer(service)
6062
}
6163

64+
// NewServerWithRepository creates a git server with an existing repository
65+
func NewServerWithRepository(t *testing.T, repo, dir string) *httptest.Server {
66+
// Create a bare repository
67+
bareDir := t.TempDir()
68+
gitDir := filepath.Join(bareDir, repo+".git")
69+
70+
// Clone the existing repository as a bare repository
71+
r, err := git.PlainClone(gitDir, true, &git.CloneOptions{
72+
URL: dir,
73+
Tags: git.AllTags,
74+
})
75+
require.NoError(t, err)
76+
77+
// Fetch all remote branches and create local branches
78+
err = r.Fetch(&git.FetchOptions{
79+
RefSpecs: []config.RefSpec{
80+
"+refs/remotes/origin/*:refs/heads/*",
81+
},
82+
Tags: git.AllTags,
83+
})
84+
if err != nil && !errors.Is(err, git.NoErrAlreadyUpToDate) {
85+
require.NoError(t, err)
86+
}
87+
88+
// Set up a git server
89+
service := gitkit.New(gitkit.Config{Dir: bareDir})
90+
err = service.Setup()
91+
require.NoError(t, err)
92+
93+
return httptest.NewServer(service)
94+
}
95+
96+
// NewTestServer creates a git server with the local copy of "github.com/aquasecurity/trivy-test-repo".
97+
// If the test repository doesn't exist, it suggests running 'mage test:unit'.
98+
func NewTestServer(t *testing.T) *httptest.Server {
99+
_, filePath, _, _ := runtime.Caller(0)
100+
dir := filepath.Join(filepath.Dir(filePath), "testdata", "test-repo")
101+
102+
if _, err := os.Stat(dir); os.IsNotExist(err) {
103+
require.Fail(t, "test-repo not found. Please run 'mage test:unit' to set up the test fixtures")
104+
}
105+
106+
return NewServerWithRepository(t, "test-repo", dir)
107+
}
108+
62109
func Clone(t *testing.T, ts *httptest.Server, repo, worktree string) *git.Repository {
63110
cloneOptions := git.CloneOptions{
64111
URL: ts.URL + "/" + repo + ".git",

internal/gittest/testdata/fixture.go

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
package gittest
2+
3+
import (
4+
"log/slog"
5+
"path/filepath"
6+
"runtime"
7+
8+
"github.com/go-git/go-git/v5"
9+
"github.com/magefile/mage/target"
10+
"golang.org/x/xerrors"
11+
)
12+
13+
const (
14+
repoURL = "https://github.com/aquasecurity/trivy-test-repo/"
15+
repoDir = "test-repo" // subdirectory for the cloned repository
16+
)
17+
18+
// Fixtures clones a Git repository for unit tests
19+
func Fixtures() error {
20+
_, filePath, _, _ := runtime.Caller(0)
21+
dir := filepath.Dir(filePath)
22+
cloneDir := filepath.Join(dir, repoDir)
23+
24+
// Check if the directory already exists and is up to date
25+
if updated, err := target.Path(cloneDir, filePath); err != nil {
26+
return err
27+
} else if !updated {
28+
return nil
29+
}
30+
31+
slog.Info("Cloning...", slog.String("url", repoURL))
32+
33+
// Clone the repository with all branches and tags
34+
_, err := git.PlainClone(cloneDir, false, &git.CloneOptions{
35+
URL: repoURL,
36+
Tags: git.AllTags,
37+
})
38+
if err != nil {
39+
return xerrors.Errorf("error cloning repository: %w", err)
40+
}
41+
42+
return nil
43+
}

magefiles/magefile.go

+5-3
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,12 @@ import (
1616
"github.com/magefile/mage/sh"
1717
"github.com/magefile/mage/target"
1818

19-
//mage:import rpm
20-
rpm "github.com/aquasecurity/trivy/pkg/fanal/analyzer/pkg/rpm/testdata"
2119
// Trivy packages should not be imported in Mage (see https://github.com/aquasecurity/trivy/pull/4242),
2220
// but this package doesn't have so many dependencies, and Mage is still fast.
21+
//mage:import gittest
22+
gittest "github.com/aquasecurity/trivy/internal/gittest/testdata"
23+
//mage:import rpm
24+
rpm "github.com/aquasecurity/trivy/pkg/fanal/analyzer/pkg/rpm/testdata"
2325
"github.com/aquasecurity/trivy/pkg/log"
2426
)
2527

@@ -286,7 +288,7 @@ func compileWasmModules(pattern string) error {
286288

287289
// Unit runs unit tests
288290
func (t Test) Unit() error {
289-
mg.Deps(t.GenerateModules, rpm.Fixtures)
291+
mg.Deps(t.GenerateModules, rpm.Fixtures, gittest.Fixtures)
290292
return sh.RunWithV(ENV, "go", "test", "-v", "-short", "-coverprofile=coverage.txt", "-covermode=atomic", "./...")
291293
}
292294

pkg/commands/app.go

-2
Original file line numberDiff line numberDiff line change
@@ -478,8 +478,6 @@ func NewRepositoryCommand(globalFlags *flag.GlobalFlagGroup) *cobra.Command {
478478

479479
repoFlags.ScanFlagGroup.DistroFlag = nil // `repo` subcommand doesn't support scanning OS packages, so we can disable `--distro`
480480

481-
repoFlags.CacheFlagGroup.CacheBackend.Default = string(cache.TypeMemory) // Use memory cache by default
482-
483481
cmd := &cobra.Command{
484482
Use: "repository [flags] (REPO_PATH | REPO_URL)",
485483
Aliases: []string{"repo"},

pkg/fanal/artifact/artifact.go

+5
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import (
1414
)
1515

1616
type Option struct {
17+
Type Type
1718
AnalyzerGroup analyzer.Group // It is empty in OSS
1819
DisabledAnalyzers []analyzer.Type
1920
DisabledHandlers []types.HandlerType
@@ -30,6 +31,10 @@ type Option struct {
3031
FileChecksum bool // For SPDX
3132
DetectionPriority types.DetectionPriority
3233

34+
// Original is the original target location, e.g. "github.com/aquasecurity/trivy"
35+
// Currently, it is used only for remote git repositories
36+
Original string
37+
3338
// Git repositories
3439
RepoBranch string
3540
RepoCommit string

0 commit comments

Comments
 (0)