Skip to content

BOLT JITLink failure when BOLTing Clang on aarch64 #71822

Closed
@nathanchance

Description

@nathanchance

When attempting to apply BOLT to clang on an aarch64 host using building the Linux kernel as the instrumentation benchmark, I get an error during the BOLT stage. I can reproduce consistently with the following workflow.

$ curl -LSs https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.1.tar.xz | tar -C src -xJf -

$ fd -d 2 .
src/
src/linux-6.6.1/
src/llvm-project/

$ /usr/bin/clang --version
clang version 17.0.4 (Fedora 17.0.4-1.fc40)
Target: aarch64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ git -C src/llvm-project show -s --format='%h ("%s")'
fd389f46deb0 ("[flang] Change `uniqueCGIdent` separator from `.` to `X` (#71338)")

$ cmake \
-B build/llvm/bootstrap \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCMAKE_AR=/usr/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCOMPILER_RT_BUILD_CRT=OFF \
-DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
-DCOMPILER_RT_BUILD_SANITIZERS=OFF \
-DCOMPILER_RT_BUILD_XRAY=OFF \
-DLLVM_BUILD_UTILS=OFF \
-DLLVM_ENABLE_ASSERTIONS=OFF \
-DLLVM_ENABLE_BACKTRACES=OFF \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld;bolt;compiler-rt' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=/usr/bin/ld.lld

$ ninja -C build/llvm/bootstrap

$ cmake \
-B build/llvm/instrumented \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCLANG_TABLEGEN=$PWD/build/llvm/bootstrap/bin/clang-tblgen \
-DCMAKE_AR=$PWD/build/llvm/bootstrap/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=$PWD/build/llvm/bootstrap/bin/clang++ \
-DCMAKE_CXX_FLAGS= \
-DCMAKE_C_COMPILER=$PWD/build/llvm/bootstrap/bin/clang \
-DCMAKE_C_FLAGS= \
-DCMAKE_RANLIB=$PWD/build/llvm/bootstrap/bin/llvm-ranlib \
-DLLVM_BUILD_INSTRUMENTED=IR \
-DLLVM_BUILD_RUNTIME=OFF \
-DLLVM_DISTRIBUTION_COMPONENTS='llvm-ar;llvm-nm;llvm-objcopy;llvm-objdump;llvm-ranlib;llvm-readelf;llvm-strip;clang;clang-resource-headers;lld' \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DLLVM_TABLEGEN=$PWD/build/llvm/bootstrap/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD='AArch64;ARM;X86' \
-DLLVM_USE_LINKER=$PWD/build/llvm/bootstrap/bin/ld.lld \
-DLLVM_VP_COUNTERS_PER_SITE=6

$ ninja -C build/llvm/instrumented distribution

$ make \
-C src/linux-6.6.1 \
-skj"$(nproc)" \
ARCH=arm64 \
KCFLAGS=-Wno-error \
LLVM=$PWD/build/llvm/instrumented/bin/ \
O=$PWD/build/linux defconfig all

$ build/llvm/bootstrap/bin/llvm-profdata merge \
-output=$PWD/build/llvm/instrumented/profdata.prof \
build/llvm/instrumented/profiles/*.profraw

$ cmake \
-B build/llvm/final \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCLANG_TABLEGEN=$PWD/build/llvm/bootstrap/bin/clang-tblgen \
-DCMAKE_AR=$PWD/build/llvm/bootstrap/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=$PWD/build/llvm/bootstrap/bin/clang++ \
-DCMAKE_CXX_FLAGS= \
-DCMAKE_C_COMPILER=$PWD/build/llvm/bootstrap/bin/clang \
-DCMAKE_C_FLAGS= \
-DCMAKE_EXE_LINKER_FLAGS=-Wl,--emit-relocs \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DCMAKE_RANLIB=$PWD/build/llvm/bootstrap/bin/llvm-ranlib \
-DLLVM_DISTRIBUTION_COMPONENTS='llvm-ar;llvm-nm;llvm-objcopy;llvm-objdump;llvm-ranlib;llvm-readelf;llvm-strip;clang;clang-resource-headers;lld' \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_PROFDATA_FILE=$PWD/build/llvm/instrumented/profdata.prof \
-DLLVM_TABLEGEN=$PWD/build/llvm/bootstrap/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD='AArch64;ARM;X86' \
-DLLVM_USE_LINKER=$PWD/build/llvm/bootstrap/bin/ld.lld

$ ninja -C build/llvm/final install-distribution

$ build/llvm/bootstrap/bin/llvm-bolt \
--instrument \
--instrumentation-file=$PWD/build/llvm/final/clang.fdata \
--instrumentation-file-append-pid \
-o install/bin/clang.inst \
"$(realpath install/bin/clang)"
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: fd389f46deb0252a7f7412ef4b0809d7dc2d7072
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x6800000, offset 0x6800000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 0 out of 135164 functions in the binary (0.0%) have non-empty execution profile
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 30216
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 130849
BOLT-INSTRUMENTER: Number of function descriptors: 130849
BOLT-INSTRUMENTER: Number of branch counters: 1064749
BOLT-INSTRUMENTER: Number of ST leaf node counters: 572082
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 1636831
BOLT-INSTRUMENTER: Total size of counters: 13094648 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 15927198 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 125545540 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file .../build/llvm/final/clang.fdata
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 10463 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-INFO: padding code to 0xe600000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x10123bf4
BOLT-INFO: clear procedure is 0x10120164
BOLT-INFO: setting __bolt_runtime_start to 0x10123b64
BOLT-INFO: setting __bolt_runtime_fini to 0x10123bf4
BOLT-INFO: setting __hot_start to 0x6a00000
BOLT-INFO: setting __hot_end to 0xe4e308c

$ make \
-C src/linux-6.6.1 \
-skj"$(nproc)" \
ARCH=arm64 \
CC=$PWD/install/bin/clang.inst \
HOSTCC=$PWD/install/bin/clang.inst \
KCFLAGS=-Wno-error \
LLVM=$PWD/install/bin/ \
O=$PWD/build/linux mrproper virtconfig all

$ build/llvm/bootstrap/bin/merge-fdata build/llvm/final/clang.fdata.*.fdata >build/llvm/final/clang.fdata

$ build/llvm/bootstrap/bin/llvm-bolt \
--data=$PWD/build/llvm/final/clang.fdata \
--dyno-stats \
--icf=1 \
-o $PWD/install/bin/clang.bolt \
--reorder-blocks=ext-tsp \
--reorder-functions=hfsort+ \
--split-all-cold \
--split-functions \
--use-gnu-stack \
"$(realpath install/bin/clang)"
Using legacy profile format.
Profile from 6079 files merged.
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: fd389f46deb0252a7f7412ef4b0809d7dc2d7072
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: enabling relocation mode
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: profile collection done on a binary already processed by BOLT
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 25654 out of 135164 functions in the binary (19.0%) have non-empty execution profile
BOLT-INFO: 867 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: ICF folded 26054 out of 135421 functions in 7 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 3213.16 KB of code space. Folded functions were called 10879947418 times based on profile.
BOLT-INFO: basic block reordering modified layout of 10442 functions (40.70% of profiled, 9.55% of total)
BOLT-INFO: 56 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 24111 to 10662
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

       1396569943937 : executed forward branches
        130283647792 : taken forward branches
        203770667059 : executed backward branches
        125227683663 : taken backward branches
         33970162534 : executed unconditional branches
        171577527582 : all function calls
         22678304162 : indirect calls
         11439979116 : PLT calls
       8936906951881 : executed instructions
       1408294259742 : executed load instructions
                   0 : executed store instructions
                   0 : taken jump table branches
                   0 : taken unknown indirect branches
       1634310773530 : total branches
        289481493989 : taken branches
       1344829279541 : non-taken conditional branches
        255511331455 : taken conditional branches
       1600340610996 : all conditional branches
                   0 : linker-inserted veneer calls

       1361735980655 : executed forward branches (-2.5%)
         64376483788 : taken forward branches (-50.6%)
        238604630341 : executed backward branches (+17.1%)
        120877684864 : taken backward branches (-3.5%)
         29160773340 : executed unconditional branches (-14.2%)
        171577527582 : all function calls (=)
         22678304162 : indirect calls (=)
         11439979116 : PLT calls (=)
       8911764180706 : executed instructions (-0.3%)
       1408294259742 : executed load instructions (=)
                   0 : executed store instructions (=)
                   0 : taken jump table branches (=)
                   0 : taken unknown indirect branches (=)
       1629501384336 : total branches (-0.3%)
        214414941992 : taken branches (-25.9%)
       1415086442344 : non-taken conditional branches (+5.2%)
        185254168652 : taken conditional branches (-27.5%)
       1600340610996 : all conditional branches (=)
                   0 : linker-inserted veneer calls (=)

BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 115911 stubs in the hot area and 40165 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-ERROR: JITLink failed: In graph in-memory object file, section .text.cold: relocation target .text + 0x1c718 at address 0x6800000 is out of range of ADRLiteral21 fixup at 0x748d3d8 (_ZNK5clang11DeclRefExpr11getBeginLocEv.cold.0/, 0x7470cc0 + 0x1c718)

I uploaded clang.fdata (zstd compressed) in case it is useful for avoiding the BOLT instrumentation phase.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions