Skip to content

Exploring PGO for the Rust compiler #79442

Open
@michaelwoerister

Description

@michaelwoerister

This issue is a landing place for discussion of whether and how to apply profile-guided optimization to rustc. There is some preliminary investigation of the topic in the Exploring PGO for the Rust compiler post on the Inside Rust blog. The gist of it is that the performance gains offered by PGO look very promising but we need to

  • confirm the results on different machines and platforms,
  • make sure that there are no reasons to not do PGO on the compiler, and
  • find a feasible way to implement this on CI (or find a less ambitious alternative).

Let's start with the first point.

Confirming the results

The blog post contains a step by step description of how to obtain a PGOed compiler -- but it is rather time consuming to actually do that. In order to make things easier I could provide a branch of the compiler that has all the changes already applied and, more importantly, a pre-recorded, checked-in .profdata file for both LLVM and rustc. Alternatively, I could just put up the final toolchain for download somewhere. Even better would be to make it available via rustup somehow. Please comment below on how to best approach this.

Reasons not to do PGO?

Concerns raised so far are:

  • This makes rustc builds non-reproducible -- something which I don't think is true. With a fixed .profdata file, both rustc and Clang should always generate the same output. That is -Cprofile-use and -fprofile-use do not introduce any source of randomness, as far as I can tell. So if the .profdata file being used is tracked by version control, we should be fine. It would be good to get some kind of further confirmation of that though.

  • If we apply PGO just to stable and beta releases, we don't get enough testing for PGO-specific toolchain bugs.

  • It is too much effort to continuously monitor the effect of PGO (e.g. via perf.rlo) because we would need PGOed nightlies in addition to non-PGOed nightlies (the latter of which serve as a baseline).

  • Doing PGO might be risky in that it adds another opportunity for LLVM bugs to introduce miscompilations.

  • It makes CI more complicated.

  • It increases cycle times for the compiler.

The last two points can definitely be true. Finding out whether they have to be is the point of the next section:

Find a feasible way of using PGO for rustc

There are several ways we can bring PGO to rustc:

  1. Provide rustbuild support for easily building your own fully PGOed compiler.
  2. Provide PGOed builds only for stable and beta releases, where the additional cycle time is offset by the lower build frequency.
  3. Provide a kind of "best-effort" PGO which uses out-dated (but regularly updated) profiling data, in the hope that it is accurate enough to still give most of the gains.

Let's go through the points in more detail:

  1. Easy DIY PGO via rustbuild - I think we should definitely do this. There is quite a bit of design space on how to structure the concrete build options (@luser has posted some relevant thoughts in a related topic). But overall it should not be too much work, and since it is completely opt-in, there's also little risk involved. In addition, it is also a necessary intermediate step for the other two options.

  2. PGO for beta and stable releases only - The feasibility of option (2) depends on a few things:

  • Is it acceptable from a testing point of view to build stable and beta artifacts with different settings than regular CI builds? Arguably beta releases get quite a bit of testing because they are used for building the compiler itself. On the other hand, building the compiler is a quite sensitive task.

  • Is it technically actually possible to do the long, three-phase compilation process on CI, or would we run into time limits set by the infrastructure? We might be more flexible in this respect now than we have been in the past.

  • How do we handle cross-compiled toolchains where profile data collection and compilation cannot run on the same system? A simple answer there is: don't do PGO for these targets. A possible better answer is to use profiling data collected on another system. This is even more relevant for the "best-effort" approach as described below.

Personally I'm on the fence whether I find this approach acceptable or not -- especially given that there is a third option that is potentially quite a bit better.

  1. Do PGO on a best-effort - After @pnkfelix asked a few questions in this direction, I've been looking into the LLVM profile data format a bit and it looks like it's actually quite robust:
  • Every function entry contains a hash value of the function's control flow graph. This gives LLVM the ability to check if a given entry is safe to use for a given function and, if not, it can just ignore the data and compile the function normally. That would be great news because it would mean that we can use profile data collected from a different version of the compiler and still get PGO for most functions. As a consequence, we could have a .profdata file in version control and always use it. An asynchronous automated task could then regularly do data collection and check it into the repository.

  • PGO works at the LLVM IR level, so everything is still rather platform independent. My guess is that the majority of functions has the same CFG on different platforms, meaning that the profile data can be collected on one platform and then be used on all other platforms. That might massively decrease the amount of complexity for bringing PGO to CI. It would also be great news for targets like macOS where the build hardware is too weak to do the whole 3-phase build.

  • Function entries are keyed by symbol name, so if the symbol name is the same across platforms (which it should be the case with the new symbol mangling scheme), LLVM should have no trouble finding the entry for a given function in a .profdata file collected on a different platform.

Overall I came to like this approach quite a bit. Once we have a .profdata file being just another file in the git repository things become quite simple. If it is enough for that file to be "eventually consistent" we can just always use PGO without thinking about it twice. Profile data collection becomes nicely decoupled from the rest of the build process.

I think the next step is to check whether the various assumptions made above actually hold, leading to the following concrete tasks:

  • Confirm that PGO is actually worth the trouble, i.e. independently replicate the results from the Exploring PGO for the Rust compiler blog post on different systems. (Done. See Exploring PGO for the Rust compiler #79442 (comment))
  • Verify that the LLVM profdata format is as robust as described above:
    • Try to find documentation or ask LLVM folks if support for partially out-of-date profdata is well supported and an actual design goal (see Exploring PGO for the Rust compiler #79442 (comment))
    • Try to find documentation or ask LLVM folks if platform independence is well supported and an actual design goal.
    • Ask people who have experience using this in production.
    • Try it out: Compile various test programs with out-of-date data and data collected on another platform. See if that leads to any hard errors.
  • Investigate how out-of-date profdata for rutsc typically is if it were collected only once a day (for example).
  • Investigate how big the mismatch between different platforms is. Concretely:
    • How many hash mismatches do we get on x86-64 Windows and macOS when compiling with profdata collected on x86-64 Linux?
    • How many hash mismatches do we get on Aarch64 macOS when compiling with profdata collected on x86-64 Linux?
    • What about x86 vs x86-64?
  • Investigate how much slower it is to build an instrumented compiler.
  • Investigate if using profdata leads to a significant compile time increase, that is, make sure that it is feasible to always compile with -Cprofile-use.
  • Double-check that PGO does not introduce a significant additional risk of running into LLVM miscompilation bugs. Ask production users for their experience.
  • Check if Rust symbol names with the current (legacy) symbol mangling scheme are platform-dependent, or if we would need to switch the compiler to the new scheme if want to use profdata across platforms.
  • Confirm that -fprofile-use and -Cprofile-use do not affect binary reproducibility (if used with a fixed .profdata file).

Once we know about all of the above we should be in a good position to decide whether to make an MCP to officially implement this.

Please post any feedback that you might have below!

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-reproducibilityArea: Reproducible / deterministic buildsC-discussionCategory: Discussion or questions that doesn't represent real issues.I-compiletimeIssue: Problems and improvements with respect to compile times.T-bootstrapRelevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-infraRelevant to the infrastructure team, which will review and decide on the PR/issue.T-releaseRelevant to the release subteam, which will review and decide on the PR/issue.WG-compiler-performanceWorking group: Compiler Performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions