Skip to content

[Profile] Dump binary id to raw profiles on Windows. #75618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 20, 2023

Conversation

ZequanWu
Copy link
Contributor

#74652 adds __buildid symbol which allows us to dump it at runtime.

@llvmbot llvmbot added compiler-rt PGO Profile Guided Optimizations labels Dec 15, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 15, 2023

@llvm/pr-subscribers-pgo

Author: Zequan Wu (ZequanWu)

Changes

#74652 adds __buildid symbol which allows us to dump it at runtime.


Full diff: https://github.com/llvm/llvm-project/pull/75618.diff

2 Files Affected:

  • (modified) compiler-rt/lib/profile/InstrProfilingPlatformWindows.c (+7)
  • (added) compiler-rt/test/profile/Windows/binary-id.c (+98)
diff --git a/compiler-rt/lib/profile/InstrProfilingPlatformWindows.c b/compiler-rt/lib/profile/InstrProfilingPlatformWindows.c
index 9070b8a606eb54..4f7ae7b7072b88 100644
--- a/compiler-rt/lib/profile/InstrProfilingPlatformWindows.c
+++ b/compiler-rt/lib/profile/InstrProfilingPlatformWindows.c
@@ -74,7 +74,14 @@ ValueProfNode *__llvm_profile_end_vnodes(void) { return &VNodesEnd; }
 ValueProfNode *CurrentVNode = &VNodesStart + 1;
 ValueProfNode *EndVNode = &VNodesEnd;
 
+uint64_t BuildIdLen = 16;
+COMPILER_RT_WEAK extern uint8_t __buildid[16];
 COMPILER_RT_VISIBILITY int __llvm_write_binary_ids(ProfDataWriter *Writer) {
+  if (*__buildid) {
+    if (Writer && lprofWriteOneBinaryId(Writer, BuildIdLen, __buildid, 0) == -1)
+      return -1;
+    return sizeof(BuildIdLen) + BuildIdLen;
+  }
   return 0;
 }
 
diff --git a/compiler-rt/test/profile/Windows/binary-id.c b/compiler-rt/test/profile/Windows/binary-id.c
new file mode 100644
index 00000000000000..fd99977fb67882
--- /dev/null
+++ b/compiler-rt/test/profile/Windows/binary-id.c
@@ -0,0 +1,98 @@
+// REQUIRES: target={{.*windows-msvc.*}}
+// REQUIRES: lld-available
+
+// RUN: %clang_profgen -O2 -o %t %s
+// RUN: env LLVM_PROFILE_FILE=%t.profraw %run %t
+// RUN: llvm-profdata show --binary-ids %t.profraw > %t.out
+// RUN: FileCheck %s --check-prefix=NO-BINARY-ID < %t.out
+// RUN: llvm-profdata merge -o %t.profdata %t.profraw
+
+// RUN: %clang_profgen -fuse-ld=lld -Wl,-build-id -O2 -o %t %s
+// RUN: env LLVM_PROFILE_FILE=%t.profraw %run %t
+// RUN: llvm-profdata show --binary-ids %t.profraw > %t.profraw.out
+// RUN: FileCheck %s --check-prefix=BINARY-ID-RAW-PROF < %t.profraw.out
+
+// RUN: rm -rf %t.profdir
+// RUN: env LLVM_PROFILE_FILE=%t.profdir/default_%m.profraw %run %t
+// RUN: env LLVM_PROFILE_FILE=%t.profdir/default_%m.profraw %run %t
+// RUN: env LLVM_PROFILE_FILE=%t.profdir/default_%m.profraw %run %t
+// RUN: llvm-profdata show --binary-ids  %t.profdir/default_*.profraw > %t.profraw.out
+// RUN: FileCheck %s --check-prefix=BINARY-ID-MERGE-PROF < %t.profraw.out
+
+// RUN: llvm-profdata merge -o %t.profdata %t.profdir/default_*.profraw
+// RUN: llvm-profdata show --binary-ids %t.profdata > %t.profdata.out
+// RUN: FileCheck %s --check-prefix=BINARY-ID-INDEXED-PROF < %t.profdata.out
+
+// Test raw profiles with DLLs.
+// RUN: rm -rf %t.dir && split-file %s %t.dir
+// RUN: %clang_profgen -O2 %t.dir/foo.c -fuse-ld=lld -Wl,-build-id -Wl,-dll -o %t.dir/foo.dll
+// RUN: %clang_profgen -O2 %t.dir/bar.c -fuse-ld=lld -Wl,-build-id -Wl,-dll -o %t.dir/bar.dll
+// RUN: %clang_profgen -O2 %t.dir/main.c -fuse-ld=lld -Wl,-build-id %t.dir/foo.lib %t.dir/bar.lib -o %t.dir/main.exe
+// RUN: env LLVM_PROFILE_FILE=%t.profraw %run %t.dir/main.exe
+// RUN: llvm-profdata show --binary-ids %t.profraw > %t.profraw.out
+// RUN: llvm-profdata merge -o %t.profdata %t.profraw
+// RUN: FileCheck %s --check-prefix=BINARY-ID-SHARE-RAW-PROF < %t.profraw.out
+
+// RUN: llvm-profdata merge -o %t.profdata %t.profraw
+// RUN: llvm-profdata show --binary-ids %t.profdata > %t.profdata.out
+// RUN: FileCheck %s --check-prefix=BINARY-ID-SHARE-INDEXED-PROF < %t.profraw.out
+
+//--- foo.c
+__declspec(dllexport) void foo() {}
+
+//--- bar.c
+__declspec(dllexport) void bar() {}
+
+//--- main.c
+__declspec(dllimport) void foo();
+__declspec(dllimport) void bar();
+int main() {
+  foo();
+  bar();
+  return 0;
+}
+
+// NO-BINARY-ID: Instrumentation level: Front-end
+// NO-BINARY-ID-NEXT: Total functions: 3
+// NO-BINARY-ID-NEXT: Maximum function count: 1
+// NO-BINARY-ID-NEXT: Maximum internal block count: 0
+// NO-BINARY-ID-NOT: Binary IDs:
+
+// BINARY-ID-RAW-PROF: Instrumentation level: Front-end
+// BINARY-ID-RAW-PROF-NEXT: Total functions: 3
+// BINARY-ID-RAW-PROF-NEXT: Maximum function count: 1
+// BINARY-ID-RAW-PROF-NEXT: Maximum internal block count: 0
+// BINARY-ID-RAW-PROF-NEXT: Binary IDs:
+// BINARY-ID-RAW-PROF-NEXT: {{[0-9a-f]+}}
+
+// BINARY-ID-MERGE-PROF: Instrumentation level: Front-end
+// BINARY-ID-MERGE-PROF-NEXT: Total functions: 3
+// BINARY-ID-MERGE-PROF-NEXT: Maximum function count: 3
+// BINARY-ID-MERGE-PROF-NEXT: Maximum internal block count: 0
+// BINARY-ID-MERGE-PROF-NEXT: Binary IDs:
+// BINARY-ID-MERGE-PROF-NEXT: {{[0-9a-f]+}}
+
+// BINARY-ID-INDEXED-PROF: Instrumentation level: Front-end
+// BINARY-ID-INDEXED-PROF-NEXT: Total functions: 3
+// BINARY-ID-INDEXED-PROF-NEXT: Maximum function count: 3
+// BINARY-ID-INDEXED-PROF-NEXT: Maximum internal block count: 0
+// BINARY-ID-INDEXED-PROF-NEXT: Binary IDs:
+// BINARY-ID-INDEXED-PROF-NEXT: {{[0-9a-f]+}}
+
+// BINARY-ID-SHARE-RAW-PROF: Instrumentation level: Front-end
+// BINARY-ID-SHARE-RAW-PROF-NEXT: Total functions: 3
+// BINARY-ID-SHARE-RAW-PROF-NEXT: Maximum function count: 1
+// BINARY-ID-SHARE-RAW-PROF-NEXT: Maximum internal block count: 0
+// BINARY-ID-SHARE-RAW-PROF-NEXT: Binary IDs:
+// BINARY-ID-SHARE-RAW-PROF-NEXT: {{[0-9a-f]+}}
+// BINARY-ID-SHARE-RAW-PROF-NEXT: {{[0-9a-f]+}}
+// BINARY-ID-SHARE-RAW-PROF-NEXT: {{[0-9a-f]+}}
+
+// BINARY-ID-SHARE-INDEXED-PROF: Instrumentation level: Front-end
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: Total functions: 3
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: Maximum function count: 1
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: Maximum internal block count: 0
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: Binary IDs:
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: {{[0-9a-f]+}}
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: {{[0-9a-f]+}}
+// BINARY-ID-SHARE-INDEXED-PROF-NEXT: {{[0-9a-f]+}}

@ZequanWu ZequanWu requested review from zmodem and david-xl December 18, 2023 20:05
@ZequanWu ZequanWu merged commit 688fa35 into llvm:main Dec 20, 2023
@nikic
Copy link
Contributor

nikic commented Feb 5, 2024

It looks like this breaks the build when using mingw:

C:/a/rust/rust/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\a\rust\rust\build\x86_64-pc-windows-gnu\stage2\lib\rustlib\x86_64-pc-windows-gnu\lib\libprofiler_builtins-06ad067b40ce3a0d.rlib(InstrProfilingPlatformWindows.o):InstrProfilingPlatformWindows.c:(.rdata$.refptr.__buildid[.refptr.__buildid]+0x0): undefined reference to `__buildid'

I don't really get how this code is supposed to work, I don't think extern selectany supports symbols that are not defined?

@ZequanWu
Copy link
Contributor Author

ZequanWu commented Feb 5, 2024

It looks like this breaks the build when using mingw:

C:/a/rust/rust/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\a\rust\rust\build\x86_64-pc-windows-gnu\stage2\lib\rustlib\x86_64-pc-windows-gnu\lib\libprofiler_builtins-06ad067b40ce3a0d.rlib(InstrProfilingPlatformWindows.o):InstrProfilingPlatformWindows.c:(.rdata$.refptr.__buildid[.refptr.__buildid]+0x0): undefined reference to `__buildid'

I don't really get how this code is supposed to work, I don't think extern selectany supports symbols that are not defined?

Yeah, I think I should drop extern: #80700

@mstorsjo
Copy link
Member

mstorsjo commented Feb 5, 2024

It looks like this breaks the build when using mingw:

C:/a/rust/rust/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\a\rust\rust\build\x86_64-pc-windows-gnu\stage2\lib\rustlib\x86_64-pc-windows-gnu\lib\libprofiler_builtins-06ad067b40ce3a0d.rlib(InstrProfilingPlatformWindows.o):InstrProfilingPlatformWindows.c:(.rdata$.refptr.__buildid[.refptr.__buildid]+0x0): undefined reference to `__buildid'

I don't really get how this code is supposed to work, I don't think extern selectany supports symbols that are not defined?

I think the idea was that it previously was a weak undefined symbol - if linking with LLD 18, the linker does define it and we can pick up the build ID from there. But while GNU binutils and GCC did come up with the way of emulating ELF weak symbols on top of COFF, they do have a couple of bugs around it, which I presume is what you're running into here.

I guess that with the fix from #80700, it changes the symbol from a weak undefined into a weak defined. If linking with LLD as opposed to GNU ld, this should hopefully be handled correctly, and it should prefer the __buildid that is provided by the linker (as a non-weak symbol) rather than the default dummy uninitialized/zero-initialized __buildid here.

@mstorsjo
Copy link
Member

mstorsjo commented Feb 6, 2024

It looks like this breaks the build when using mingw:

C:/a/rust/rust/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: C:\a\rust\rust\build\x86_64-pc-windows-gnu\stage2\lib\rustlib\x86_64-pc-windows-gnu\lib\libprofiler_builtins-06ad067b40ce3a0d.rlib(InstrProfilingPlatformWindows.o):InstrProfilingPlatformWindows.c:(.rdata$.refptr.__buildid[.refptr.__buildid]+0x0): undefined reference to `__buildid'

I don't really get how this code is supposed to work, I don't think extern selectany supports symbols that are not defined?

I think the idea was that it previously was a weak undefined symbol - if linking with LLD 18, the linker does define it and we can pick up the build ID from there. But while GNU binutils and GCC did come up with the way of emulating ELF weak symbols on top of COFF, they do have a couple of bugs around it, which I presume is what you're running into here.

I guess that with the fix from #80700, it changes the symbol from a weak undefined into a weak defined. If linking with LLD as opposed to GNU ld, this should hopefully be handled correctly, and it should prefer the __buildid that is provided by the linker (as a non-weak symbol) rather than the default dummy uninitialized/zero-initialized __buildid here.

Sorry, I hadn't noticed that COMPILER_RT_WEAK indeed expands to selectany instead of actual weak attributes.

I tested this case, with the following test snippet, extracted from the code here:

#include <stdio.h>
#include <stdint.h>

__attribute__((selectany)) uint8_t __buildid[16];

int main(int argc, char **argv) {
  if (*__buildid) {
    printf("got build id:");
    for (int i = 0; i < 16; i++)
      printf(" %02x", __buildid[i]);
    printf("\n");
  } else {
    printf("no build id found\n");
  }
  return 0;
}

This correctly prints the build ID when built with Clang and linked with LLD. When linked with GNU ld, it links correctly and has no build id. So that bit works as intended.

I have a faint feeling that this is brittle - we have two separate definitions of __buildid, one builtin from the linker (which is a regular symbol) and one from the source (which is comdat selectany). So in principle, the linker could be free to pick any of them.

In practice, with the implementation in LLD, these two definitions don't go through the normal comdat conflict resolution, but LLD just overwrites the __buildid symbol, whatever it is, with its own definition: https://github.com/llvm/llvm-project/blob/llvmorg-18.1.0-rc1/lld/COFF/Writer.cpp#L1124-L1126

So as long as the LLD implementation does this, we're safe here.

The setup that I had in mind would have been this:

#include <stdio.h>
#include <stdint.h>

__attribute__((weak)) extern uint8_t __buildid[16];

int main(int argc, char **argv) {
  if (__buildid) {
    printf("got build id:");
    for (int i = 0; i < 16; i++)
      printf(" %02x", __buildid[i]);
    printf("\n");
  } else {
    printf("no build id found\n");
  }
  return 0;
} 

Note the condition if (__buildid), while it previously checked whether the first ID byte was nonzero if (*__buildid) in the former case (and in the compiler-rt source now).

This setup works with Clang+LLD (both new that do define __buildid and older ones that don't). If compiled with Clang and linked with GNU ld, it still works as expected, but if compiled with GCC, it fails to link.

So TL;DR, the solution picked here is fine (and probably the least problematic compromise), at least as long as LLD sets the __buildid symbol in the way it does.

@nikic
Copy link
Contributor

nikic commented Feb 6, 2024

@mstorsjo Thanks for the thorough analysis!

@ZequanWu ZequanWu deleted the profile-windows-buildid branch February 15, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants