Closed
Description
When attempting to apply BOLT to clang on an aarch64 host using building the Linux kernel as the instrumentation benchmark, I get an error during the BOLT stage. I can reproduce consistently with the following workflow.
$ curl -LSs https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.1.tar.xz | tar -C src -xJf -
$ fd -d 2 .
src/
src/linux-6.6.1/
src/llvm-project/
$ /usr/bin/clang --version
clang version 17.0.4 (Fedora 17.0.4-1.fc40)
Target: aarch64-redhat-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ git -C src/llvm-project show -s --format='%h ("%s")'
fd389f46deb0 ("[flang] Change `uniqueCGIdent` separator from `.` to `X` (#71338)")
$ cmake \
-B build/llvm/bootstrap \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCMAKE_AR=/usr/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCOMPILER_RT_BUILD_CRT=OFF \
-DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
-DCOMPILER_RT_BUILD_SANITIZERS=OFF \
-DCOMPILER_RT_BUILD_XRAY=OFF \
-DLLVM_BUILD_UTILS=OFF \
-DLLVM_ENABLE_ASSERTIONS=OFF \
-DLLVM_ENABLE_BACKTRACES=OFF \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld;bolt;compiler-rt' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_INCLUDE_TESTS=OFF \
-DLLVM_TARGETS_TO_BUILD=host \
-DLLVM_USE_LINKER=/usr/bin/ld.lld
$ ninja -C build/llvm/bootstrap
$ cmake \
-B build/llvm/instrumented \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCLANG_TABLEGEN=$PWD/build/llvm/bootstrap/bin/clang-tblgen \
-DCMAKE_AR=$PWD/build/llvm/bootstrap/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=$PWD/build/llvm/bootstrap/bin/clang++ \
-DCMAKE_CXX_FLAGS= \
-DCMAKE_C_COMPILER=$PWD/build/llvm/bootstrap/bin/clang \
-DCMAKE_C_FLAGS= \
-DCMAKE_RANLIB=$PWD/build/llvm/bootstrap/bin/llvm-ranlib \
-DLLVM_BUILD_INSTRUMENTED=IR \
-DLLVM_BUILD_RUNTIME=OFF \
-DLLVM_DISTRIBUTION_COMPONENTS='llvm-ar;llvm-nm;llvm-objcopy;llvm-objdump;llvm-ranlib;llvm-readelf;llvm-strip;clang;clang-resource-headers;lld' \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DLLVM_TABLEGEN=$PWD/build/llvm/bootstrap/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD='AArch64;ARM;X86' \
-DLLVM_USE_LINKER=$PWD/build/llvm/bootstrap/bin/ld.lld \
-DLLVM_VP_COUNTERS_PER_SITE=6
$ ninja -C build/llvm/instrumented distribution
$ make \
-C src/linux-6.6.1 \
-skj"$(nproc)" \
ARCH=arm64 \
KCFLAGS=-Wno-error \
LLVM=$PWD/build/llvm/instrumented/bin/ \
O=$PWD/build/linux defconfig all
$ build/llvm/bootstrap/bin/llvm-profdata merge \
-output=$PWD/build/llvm/instrumented/profdata.prof \
build/llvm/instrumented/profiles/*.profraw
$ cmake \
-B build/llvm/final \
-G Ninja \
-S src/llvm-project/llvm \
-Wno-dev \
--log-level=NOTICE \
-DCLANG_ENABLE_ARCMT=OFF \
-DCLANG_ENABLE_STATIC_ANALYZER=OFF \
-DCLANG_PLUGIN_SUPPORT=OFF \
-DCLANG_TABLEGEN=$PWD/build/llvm/bootstrap/bin/clang-tblgen \
-DCMAKE_AR=$PWD/build/llvm/bootstrap/bin/llvm-ar \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_ARCHIVE_CREATE='<CMAKE_AR> DqcT <TARGET> <OBJECTS>' \
-DCMAKE_CXX_ARCHIVE_FINISH=true \
-DCMAKE_CXX_COMPILER=$PWD/build/llvm/bootstrap/bin/clang++ \
-DCMAKE_CXX_FLAGS= \
-DCMAKE_C_COMPILER=$PWD/build/llvm/bootstrap/bin/clang \
-DCMAKE_C_FLAGS= \
-DCMAKE_EXE_LINKER_FLAGS=-Wl,--emit-relocs \
-DCMAKE_INSTALL_PREFIX=$PWD/install \
-DCMAKE_RANLIB=$PWD/build/llvm/bootstrap/bin/llvm-ranlib \
-DLLVM_DISTRIBUTION_COMPONENTS='llvm-ar;llvm-nm;llvm-objcopy;llvm-objdump;llvm-ranlib;llvm-readelf;llvm-strip;clang;clang-resource-headers;lld' \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_BINDINGS=OFF \
-DLLVM_ENABLE_OCAMLDOC=OFF \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DLLVM_ENABLE_TERMINFO=OFF \
-DLLVM_ENABLE_WARNINGS=OFF \
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR= \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_INCLUDE_EXAMPLES=OFF \
-DLLVM_PROFDATA_FILE=$PWD/build/llvm/instrumented/profdata.prof \
-DLLVM_TABLEGEN=$PWD/build/llvm/bootstrap/bin/llvm-tblgen \
-DLLVM_TARGETS_TO_BUILD='AArch64;ARM;X86' \
-DLLVM_USE_LINKER=$PWD/build/llvm/bootstrap/bin/ld.lld
$ ninja -C build/llvm/final install-distribution
$ build/llvm/bootstrap/bin/llvm-bolt \
--instrument \
--instrumentation-file=$PWD/build/llvm/final/clang.fdata \
--instrumentation-file-append-pid \
-o install/bin/clang.inst \
"$(realpath install/bin/clang)"
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: fd389f46deb0252a7f7412ef4b0809d7dc2d7072
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x6800000, offset 0x6800000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 0 out of 135164 functions in the binary (0.0%) have non-empty execution profile
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 30216
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 130849
BOLT-INSTRUMENTER: Number of function descriptors: 130849
BOLT-INSTRUMENTER: Number of branch counters: 1064749
BOLT-INSTRUMENTER: Number of ST leaf node counters: 572082
BOLT-INSTRUMENTER: Number of direct call counters: 0
BOLT-INSTRUMENTER: Total number of counters: 1636831
BOLT-INSTRUMENTER: Total size of counters: 13094648 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 15927198 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 125545540 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file .../build/llvm/final/clang.fdata
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 10463 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-INFO: padding code to 0xe600000 to accommodate hot text
BOLT-INFO: output linked against instrumentation runtime library, lib entry point is 0x10123bf4
BOLT-INFO: clear procedure is 0x10120164
BOLT-INFO: setting __bolt_runtime_start to 0x10123b64
BOLT-INFO: setting __bolt_runtime_fini to 0x10123bf4
BOLT-INFO: setting __hot_start to 0x6a00000
BOLT-INFO: setting __hot_end to 0xe4e308c
$ make \
-C src/linux-6.6.1 \
-skj"$(nproc)" \
ARCH=arm64 \
CC=$PWD/install/bin/clang.inst \
HOSTCC=$PWD/install/bin/clang.inst \
KCFLAGS=-Wno-error \
LLVM=$PWD/install/bin/ \
O=$PWD/build/linux mrproper virtconfig all
$ build/llvm/bootstrap/bin/merge-fdata build/llvm/final/clang.fdata.*.fdata >build/llvm/final/clang.fdata
$ build/llvm/bootstrap/bin/llvm-bolt \
--data=$PWD/build/llvm/final/clang.fdata \
--dyno-stats \
--icf=1 \
-o $PWD/install/bin/clang.bolt \
--reorder-blocks=ext-tsp \
--reorder-functions=hfsort+ \
--split-all-cold \
--split-functions \
--use-gnu-stack \
"$(realpath install/bin/clang)"
Using legacy profile format.
Profile from 6079 files merged.
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: aarch64
BOLT-INFO: BOLT version: fd389f46deb0252a7f7412ef4b0809d7dc2d7072
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: enabling relocation mode
BOLT-INFO: disabling -align-macro-fusion on non-x86 platform
BOLT-INFO: pre-processing profile using branch profile reader
BOLT-INFO: profile collection done on a binary already processed by BOLT
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 25654 out of 135164 functions in the binary (19.0%) have non-empty execution profile
BOLT-INFO: 867 functions with profile could not be optimized
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: ICF folded 26054 out of 135421 functions in 7 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 3213.16 KB of code space. Folded functions were called 10879947418 times based on profile.
BOLT-INFO: basic block reordering modified layout of 10442 functions (40.70% of profiled, 9.55% of total)
BOLT-INFO: 56 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 24111 to 10662
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
1396569943937 : executed forward branches
130283647792 : taken forward branches
203770667059 : executed backward branches
125227683663 : taken backward branches
33970162534 : executed unconditional branches
171577527582 : all function calls
22678304162 : indirect calls
11439979116 : PLT calls
8936906951881 : executed instructions
1408294259742 : executed load instructions
0 : executed store instructions
0 : taken jump table branches
0 : taken unknown indirect branches
1634310773530 : total branches
289481493989 : taken branches
1344829279541 : non-taken conditional branches
255511331455 : taken conditional branches
1600340610996 : all conditional branches
0 : linker-inserted veneer calls
1361735980655 : executed forward branches (-2.5%)
64376483788 : taken forward branches (-50.6%)
238604630341 : executed backward branches (+17.1%)
120877684864 : taken backward branches (-3.5%)
29160773340 : executed unconditional branches (-14.2%)
171577527582 : all function calls (=)
22678304162 : indirect calls (=)
11439979116 : PLT calls (=)
8911764180706 : executed instructions (-0.3%)
1408294259742 : executed load instructions (=)
0 : executed store instructions (=)
0 : taken jump table branches (=)
0 : taken unknown indirect branches (=)
1629501384336 : total branches (-0.3%)
214414941992 : taken branches (-25.9%)
1415086442344 : non-taken conditional branches (+5.2%)
185254168652 : taken conditional branches (-27.5%)
1600340610996 : all conditional branches (=)
0 : linker-inserted veneer calls (=)
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 115911 stubs in the hot area and 40165 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-ERROR: JITLink failed: In graph in-memory object file, section .text.cold: relocation target .text + 0x1c718 at address 0x6800000 is out of range of ADRLiteral21 fixup at 0x748d3d8 (_ZNK5clang11DeclRefExpr11getBeginLocEv.cold.0/, 0x7470cc0 + 0x1c718)
I uploaded clang.fdata
(zstd
compressed) in case it is useful for avoiding the BOLT instrumentation phase.