Skip to content

ci: mirror ubuntu:22.04 to ghcr #135530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/ghcr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Mirror DockerHub images used by the Rust project to ghcr.io.
#
# In some CI jobs, we pull images from ghcr.io instead of Docker Hub because
# Docker Hub has a rate limit, while ghcr.io doesn't.
# Those images are pushed to ghcr.io by this job.
#
# Note that authenticating to DockerHub or other registries isn't possible
# for PR jobs, because forks can't access secrets.
# That's why we use ghcr.io: it has no rate limit and doesn't require authentication.

name: GHCR

on:
schedule:
# Run daily at midnight UTC
- cron: '0 0 * * *'

jobs:
mirror:
name: DockerHub mirror
runs-on: ubuntu-24.04
permissions:
# Needed to write to the ghcr.io registry
packages: write
steps:
- uses: actions/checkout@v4
with:
persist-credentials: false

- uses: docker/login-action@9780b0c442fbb1117ed29e0efdff1e18412f7567 # v3.3.0
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ github.token }}

- name: Mirror DockerHub
run: python3 src/ci/github-actions/ghcr.py
shell: bash
75 changes: 75 additions & 0 deletions src/ci/github-actions/ghcr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'm a fan of rewriting complex shell logic into Python or Rust, but in this case, it's literally curl github > crane && crane ..., right? I would just include this in the YAML file directly, this file seems unnecessarily complex for the logic that it does. Since we just have a single image now, we don't even need to write a bash for loop :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll close this PR and open a new one so that we don't loose the python code in case we need it another time.

Anyway my opinion is that every bash script started with the thought "oh, it's simple, I'll just write it in bash", and than slowly the complexity increased, so I thought to start in python directly and try to do better error reporting, comments and using a temporary directory for the download so that the current directory isn't full of other useless files.

Plus I thought that the code could become complex in the case where:

  • we might need to mirror more than one image.
  • we might need to mirror images for arm. Not sure if crane allows to do this from an x86 machine.

This looks complex, but for example we could extract the functions to common modules that could be shared among other CI scripts and so in the end the code will read like plain english (similar to the main function)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I definitely agree that once it becomes complex, it might be a good idea to rewrite. But if it's very straightforward in Bash, I would keep it there, until the complexity balloons. Tbh, I would probably do just

docker pull <image>
docker pugh ghcr.io/<image>

if it works :) But maybe crane manages some extra things on top.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I haven't checked the crane source code 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to docker pull and push. The difference is that crane syncs all available architectures.

For example, compare

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for trying! Well, for now we only need x64, so I'd be fine with just staying with the simplest option and just doing pull/push, which is easy to understand (vs using an external tool). We can switch in the future if we also need to mirror ARM images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still useful to mirror all architectures: https://github.com/rust-lang/rust/pull/135574/files#r1918185315

Use crane to mirror images from DockerHub to GHCR.
Learn more about crane at
https://github.com/google/go-containerregistry/blob/main/cmd/crane/README.md
"""

import os
import requests
import tarfile
import shutil
import subprocess
from io import BytesIO
from tempfile import TemporaryDirectory


def crane_gh_release_url() -> str:
version = "v0.20.2"
os_name = "Linux"
arch = "x86_64"
base_url = "https://github.com/google/go-containerregistry/releases/download"
return f"{base_url}/{version}/go-containerregistry_{os_name}_{arch}.tar.gz"


def download_crane():
"""Download the crane executable from the GitHub releases in the current directory."""

try:
# Download the GitHub release tar.gz file
response = requests.get(crane_gh_release_url(), stream=True)
response.raise_for_status()

with TemporaryDirectory() as tmp_dir:
# Extract the tar.gz file to temp dir
with tarfile.open(fileobj=BytesIO(response.content), mode="r:gz") as tar:
tar.extractall(path=tmp_dir)

# The tar.gz file contains multiple files.
# Copy crane executable to current directory.
# We don't need the other files.
crane_path = os.path.join(tmp_dir, "crane")
shutil.copy2(crane_path, "./crane")

print("Successfully downloaded and extracted crane")

except requests.RequestException as e:
raise RuntimeError(f"Failed to download crane: {e}") from e
except (tarfile.TarError, OSError) as e:
raise RuntimeError(f"Failed to extract crane: {e}") from e


def mirror_dockerhub():
# Images from DockerHub that we want to mirror
images = ["ubuntu:22.04"]
for img in images:
repo_owner = "rust-lang"
# Command to mirror images from DockerHub to GHCR
command = ["./crane", "copy", f"docker.io/{img}", f"ghcr.io/{repo_owner}/{img}"]
try:
subprocess.run(
command,
# if the process exits with a non-zero exit code,
# raise the CalledProcessError exception
check=True,
# open stdout and stderr in text mode
text=True,
)
print(f"Successfully mirrored {img}")
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Failed to mirror {img}: {e}") from e
print("Successfully mirrored all images")


if __name__ == "__main__":
download_crane()
mirror_dockerhub()
Loading