Open
Description
Purpose
Why we need an abstract layer for managed repositories? I think there are some benefit for that.
- Convert to a different storage directory structure. Currently, renaming a user or repository will need to rename the disk directories. This makes it difficult to keep consistent when operations failure. The best method is to use fixed repository information as directorie names, we can use user/repository id or others as directories name so when rename user/repository, no disk operation is necessary.
- Reduce fork repositories size. Git itself supports shared repositories but Gitea haven't use this feature to reduce forked repositories disk usage. Some designs need to be considered. Which one should be the root repositories of the base and forked repositories? Should we have a hide repository as the root repositories? This is also related as the layer.
- For big gitea sites or for high availability system, distributed git storage is a MUST. Currently, users can use NFS to store the managed git repositories. But it still has the single node problem.
Concepts
I ever sent some PRs to want to introduce a layer in the module/git
but I found it's not the right direction. That package modules/git
should be a basis package which will always focus on handling disk operations. Whatever the repository is the managed one, the wiki one, the temporary one or the hide one. So I think some concepts need to be introduced to clarify.
- Managed Git Repositories: All repositories recorded on Gitea's databases include wiki repositories or future other types repositories can be considered as managed git repositories. Only these git repositories should be managed by the distributed system.
- Temporary Git Repositories: The repositories will be created/deleted when doing some operations in Gitea internal. Those repositories will be stored on system's temporary file system and will be clean after the related operations finished.
modules/git
: This package should be a low level package which can handle any disk git repositories. For managed git repositories, a new package should be introduced.modules/gitrepo
: This is the new package introduced as an abstract layer to handle managed git repositories. It may include different storage strategy but the interface to other package is almost the same as before to hide the implementation details. This package will depend onmodules/git
and should not depend on anymodels
packages. It can be dependent by othermodules
,services
layer packages.
Refactoring
To address the purpose, we need do some refactorings.
- Hide the
setting.RepoRootPath
into themodules/gitrepo
package. Any other non-test packages should not use it directly. There could be some method provided by that package likeGitStorageInfo
to return the storage methods and storage path but that should only be used as information displayed on UI. - All managed git repositories invokes the functions on
modules/gitrepo
but notmodules/git
and all the functions inmodules/gitrepo
should hide the absoluteRepoPath
even the relative storage path but useID
as directory name. Just use some interface like
type Repository interface {
GetID() int64
GetOwnerName() string
GetRepoName() string
}
RunGitCmd
should be in the new package. And it can become a proxy method to invoke different implementation.
Mocking
To make the abstract work, we need a mocking git storage server which can reuse the current repository root path but all requests are come from the HTTP operations. So there will be two implemenations for the basic operations. i.e.
- For local disk operations
func runGitCmdLocal(repo Repository, c *git.Command, opts *RunOpts) error {
if opts.Dir != "" {
// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debug
panic("dir field must be empty when using RunStdBytes")
}
opts.Dir = getPath(repo, opts.IsWiki)
return c.Run(&opts.RunOpts)
}
- For mock http storage service
func runGitCmdForMockServer(repo Repository, c *git.Command, opts *RunOpts) error {
if opts.Dir != "" {
// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debug
panic("dir field must be empty when using RunStdBytes")
}
return mockHTTPClient.RunGitCmd(ctx, repo.GetOwnerName(), repo.GetRepoName(), c, opts.RunOpts)
}