Skip to content

analysis on performance bottlenecks in repo migration #14772

Closed
@noerw

Description

@noerw

Repo migration from Github can take a long time (eg this gitea repo takes more than 24h on a small VPS (hetzner CPX11)).
It's unclear (to me) if this is incurred by ratelimits of external services (GH API), or due to inefficiencies in Giteas migration module.

The aim of this issue is to identify the bottlenecks involved. For now the primary method for this is to collect pprof profiles, to investigate in which routines most time is spent.
This partly also gives insight into which network, disk, DB operations take much time, but only indirectly. For that, analyzing DB query times specifically might be more helpful. (If somebody can outline a good process for that, a comment here would be appreciated ;)

I sampled some pprof activity during a migration of https://github.com/go-gitea/gitea, including all entities except for releases, gitea 1.14.0+dev-713-gec06eb112. which ran over several hours. You can find several pprof profiles that were sampled for 30 seconds - 30 minutes attached:
pprof.gitea.samples.cpu.00.zip

  • To inspect them run go tool pprof -http :8080 <path to profile>
  • To collect you own profiles, set ENABLE_PPROF = true under [server] in app.ini, then call go tool pprof -seconds 1800 0.0.0.0:6060

Actual analysis of these samples will follow in the upcoming days..

Server utilization graphs for the middle 12 hours of migration:
grafik
This looks to me like the higher-utilization phases each hour are related to a reset of the github ratelimiter window, so we're down to ~33% of potential performance just through github ratelimits

Metadata

Metadata

Assignees

No one assigned

    Labels

    issue/needs-feedbackFor bugs, we need more details. For features, the feature must be described in more detailperformance/bigrepoPerformance Issues affecting Big Repositoriesperformance/speedperformance issues with slow downstopic/repo-migrationMigrate repos from other platforms to Gitea, or from Gitea to them

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions