Skip to content

Proposal: An abstract layer for managed git repositories #29033

Open
@lunny

Description

@lunny

Purpose

Why we need an abstract layer for managed repositories? I think there are some benefit for that.

  • Convert to a different storage directory structure. Currently, renaming a user or repository will need to rename the disk directories. This makes it difficult to keep consistent when operations failure. The best method is to use fixed repository information as directorie names, we can use user/repository id or others as directories name so when rename user/repository, no disk operation is necessary.
  • Reduce fork repositories size. Git itself supports shared repositories but Gitea haven't use this feature to reduce forked repositories disk usage. Some designs need to be considered. Which one should be the root repositories of the base and forked repositories? Should we have a hide repository as the root repositories? This is also related as the layer.
  • For big gitea sites or for high availability system, distributed git storage is a MUST. Currently, users can use NFS to store the managed git repositories. But it still has the single node problem.

Concepts

I ever sent some PRs to want to introduce a layer in the module/git but I found it's not the right direction. That package modules/git should be a basis package which will always focus on handling disk operations. Whatever the repository is the managed one, the wiki one, the temporary one or the hide one. So I think some concepts need to be introduced to clarify.

  • Managed Git Repositories: All repositories recorded on Gitea's databases include wiki repositories or future other types repositories can be considered as managed git repositories. Only these git repositories should be managed by the distributed system.
  • Temporary Git Repositories: The repositories will be created/deleted when doing some operations in Gitea internal. Those repositories will be stored on system's temporary file system and will be clean after the related operations finished.
  • modules/git: This package should be a low level package which can handle any disk git repositories. For managed git repositories, a new package should be introduced.
  • modules/gitrepo: This is the new package introduced as an abstract layer to handle managed git repositories. It may include different storage strategy but the interface to other package is almost the same as before to hide the implementation details. This package will depend on modules/git and should not depend on any models packages. It can be dependent by other modules, services layer packages.

Refactoring

To address the purpose, we need do some refactorings.

  • Hide the setting.RepoRootPath into the modules/gitrepo package. Any other non-test packages should not use it directly. There could be some method provided by that package like GitStorageInfo to return the storage methods and storage path but that should only be used as information displayed on UI.
  • All managed git repositories invokes the functions on modules/gitrepo but not modules/git and all the functions in modules/gitrepo should hide the absolute RepoPath even the relative storage path but use ID as directory name. Just use some interface like
type Repository interface {
GetID() int64
GetOwnerName() string
GetRepoName() string
}
  • RunGitCmd should be in the new package. And it can become a proxy method to invoke different implementation.

Mocking

To make the abstract work, we need a mocking git storage server which can reuse the current repository root path but all requests are come from the HTTP operations. So there will be two implemenations for the basic operations. i.e.

  • For local disk operations
func runGitCmdLocal(repo Repository, c *git.Command, opts *RunOpts) error {
	if opts.Dir != "" {
		// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debug
		panic("dir field must be empty when using RunStdBytes")
	}
	opts.Dir = getPath(repo, opts.IsWiki)
	return c.Run(&opts.RunOpts)
}
  • For mock http storage service
func runGitCmdForMockServer(repo Repository, c *git.Command, opts *RunOpts) error {
	if opts.Dir != "" {
		// we must panic here, otherwise there would be bugs if developers set Dir by mistake, and it would be very difficult to debug
		panic("dir field must be empty when using RunStdBytes")
	}
	
	return mockHTTPClient.RunGitCmd(ctx, repo.GetOwnerName(), repo.GetRepoName(), c, opts.RunOpts)
}

Related PRs

#28937
#28940
#28966

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/proposalThe new feature has not been accepted yet but needs to be discussed first.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions