Skip to content

[LLD][COFF] Add support for alternate entry point in CHPE metadata on ARM64X #123346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 20, 2025

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Jan 17, 2025

Includes handling for ARM64X relocations relative to a symbol.

… ARM64X

Includes handling for ARM64X relocations relative to a symbol.
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-coff

Author: Jacek Caban (cjacek)

Changes

Includes handling for ARM64X relocations relative to a symbol.


Full diff: https://github.com/llvm/llvm-project/pull/123346.diff

4 Files Affected:

  • (modified) lld/COFF/Chunks.cpp (+32-13)
  • (modified) lld/COFF/Chunks.h (+4-4)
  • (modified) lld/COFF/Writer.cpp (+29-1)
  • (modified) lld/test/COFF/arm64x-entry.test (+69-1)
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index 115e3457db6978..ff3c89884c24df 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -1183,7 +1183,7 @@ size_t Arm64XDynamicRelocEntry::getSize() const {
 
 void Arm64XDynamicRelocEntry::writeTo(uint8_t *buf) const {
   auto out = reinterpret_cast<ulittle16_t *>(buf);
-  *out = (offset & 0xfff) | (type << 12);
+  *out = (offset.get() & 0xfff) | (type << 12);
 
   switch (type) {
   case IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE:
@@ -1211,14 +1211,19 @@ void Arm64XDynamicRelocEntry::writeTo(uint8_t *buf) const {
 void DynamicRelocsChunk::finalize() {
   llvm::stable_sort(arm64xRelocs, [=](const Arm64XDynamicRelocEntry &a,
                                       const Arm64XDynamicRelocEntry &b) {
-    return a.offset < b.offset;
+    return a.offset.get() < b.offset.get();
   });
 
-  size = sizeof(coff_dynamic_reloc_table) + sizeof(coff_dynamic_relocation64) +
-         sizeof(coff_base_reloc_block_header);
+  size = sizeof(coff_dynamic_reloc_table) + sizeof(coff_dynamic_relocation64);
+  uint32_t prevPage = 0xfff;
 
   for (const Arm64XDynamicRelocEntry &entry : arm64xRelocs) {
-    assert(!(entry.offset & ~0xfff)); // Not yet supported.
+    uint32_t page = entry.offset.get() & ~0xfff;
+    if (page != prevPage) {
+      size = alignTo(size, sizeof(uint32_t)) +
+             sizeof(coff_base_reloc_block_header);
+      prevPage = page;
+    }
     size += entry.getSize();
   }
 
@@ -1235,17 +1240,31 @@ void DynamicRelocsChunk::writeTo(uint8_t *buf) const {
   header->Symbol = IMAGE_DYNAMIC_RELOCATION_ARM64X;
   buf += sizeof(*header);
 
-  auto pageHeader = reinterpret_cast<coff_base_reloc_block_header *>(buf);
-  pageHeader->BlockSize = sizeof(*pageHeader);
+  coff_base_reloc_block_header *pageHeader = nullptr;
+  size_t relocSize = 0;
   for (const Arm64XDynamicRelocEntry &entry : arm64xRelocs) {
-    entry.writeTo(buf + pageHeader->BlockSize);
-    pageHeader->BlockSize += entry.getSize();
+    uint32_t page = entry.offset.get() & ~0xfff;
+    if (!pageHeader || page != pageHeader->PageRVA) {
+      relocSize = alignTo(relocSize, sizeof(uint32_t));
+      if (pageHeader)
+        pageHeader->BlockSize =
+            buf + relocSize - reinterpret_cast<uint8_t *>(pageHeader);
+      pageHeader =
+          reinterpret_cast<coff_base_reloc_block_header *>(buf + relocSize);
+      pageHeader->PageRVA = page;
+      relocSize += sizeof(*pageHeader);
+    }
+
+    entry.writeTo(buf + relocSize);
+    relocSize += entry.getSize();
   }
-  pageHeader->BlockSize = alignTo(pageHeader->BlockSize, sizeof(uint32_t));
+  relocSize = alignTo(relocSize, sizeof(uint32_t));
+  pageHeader->BlockSize =
+      buf + relocSize - reinterpret_cast<uint8_t *>(pageHeader);
 
-  header->BaseRelocSize = pageHeader->BlockSize;
-  table->Size += header->BaseRelocSize;
-  assert(size == sizeof(*table) + sizeof(*header) + header->BaseRelocSize);
+  header->BaseRelocSize = relocSize;
+  table->Size += relocSize;
+  assert(size == sizeof(*table) + sizeof(*header) + relocSize);
 }
 
 } // namespace lld::coff
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 46fd8e21dce659..7ba58e336451fc 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -851,13 +851,13 @@ class Arm64XRelocVal {
 class Arm64XDynamicRelocEntry {
 public:
   Arm64XDynamicRelocEntry(llvm::COFF::Arm64XFixupType type, uint8_t size,
-                          uint32_t offset, Arm64XRelocVal value)
+                          Arm64XRelocVal offset, Arm64XRelocVal value)
       : offset(offset), value(value), type(type), size(size) {}
 
   size_t getSize() const;
   void writeTo(uint8_t *buf) const;
 
-  uint32_t offset;
+  Arm64XRelocVal offset;
   Arm64XRelocVal value;
 
 private:
@@ -873,8 +873,8 @@ class DynamicRelocsChunk : public NonSectionChunk {
   void writeTo(uint8_t *buf) const override;
   void finalize();
 
-  void add(llvm::COFF::Arm64XFixupType type, uint8_t size, uint32_t offset,
-           Arm64XRelocVal value) {
+  void add(llvm::COFF::Arm64XFixupType type, uint8_t size,
+           Arm64XRelocVal offset, Arm64XRelocVal value) {
     arm64xRelocs.emplace_back(type, size, offset, value);
   }
 
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 8247f131dcf077..94b5dfcc5f3e2b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -2352,6 +2352,23 @@ void Writer::setECSymbols() {
       delayIatCopySym, "__hybrid_auxiliary_delayload_iat_copy",
       delayIdata.getAuxIatCopy().empty() ? nullptr
                                          : delayIdata.getAuxIatCopy().front());
+
+  if (ctx.hybridSymtab) {
+    // For the hybrid image, set the alternate entry point to the EC entry
+    // point. In the hybrid view, it is swapped to the native entry point
+    // using ARM64X relocations.
+    if (auto altEntrySym = cast_or_null<Defined>(ctx.hybridSymtab->entry)) {
+      // If the entry is an EC export thunk, use its target instead.
+      if (auto thunkChunk =
+              dyn_cast<ECExportThunkChunk>(altEntrySym->getChunk()))
+        altEntrySym = thunkChunk->target;
+      Symbol *nativeEntrySym =
+          symtab->findUnderscore("__arm64x_native_entrypoint");
+      replaceSymbol<DefinedAbsolute>(
+          nativeEntrySym, ctx, "__arm64x_native_entrypoint",
+          ctx.config.imageBase + altEntrySym->getRVA());
+    }
+  }
 }
 
 // Write section contents to a mmap'ed file.
@@ -2586,12 +2603,23 @@ void Writer::createDynamicRelocs() {
                          coffHeaderOffset + offsetof(coff_file_header, Machine),
                          AMD64);
 
-  if (ctx.symtab.entry != ctx.hybridSymtab->entry)
+  if (ctx.symtab.entry != ctx.hybridSymtab->entry) {
     ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint32_t),
                            peHeaderOffset +
                                offsetof(pe32plus_header, AddressOfEntryPoint),
                            cast_or_null<Defined>(ctx.hybridSymtab->entry));
 
+    // Swap the alternate entry point in the CHPE metadata.
+    Symbol *s = ctx.hybridSymtab->findUnderscore("__chpe_metadata");
+    if (auto chpeSym = cast_or_null<DefinedRegular>(s))
+      ctx.dynamicRelocs->add(
+          IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint32_t),
+          Arm64XRelocVal(chpeSym, offsetof(chpe_metadata, AlternateEntryPoint)),
+          cast_or_null<Defined>(ctx.symtab.entry));
+    else
+      Warn(ctx) << "'__chpe_metadata' is missing for ARM64X target";
+  }
+
   // Set the hybrid load config to the EC load config.
   ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, sizeof(uint32_t),
                          dataDirOffset64 +
diff --git a/lld/test/COFF/arm64x-entry.test b/lld/test/COFF/arm64x-entry.test
index d5363c66544a5f..1c2e7e7a0c93ad 100644
--- a/lld/test/COFF/arm64x-entry.test
+++ b/lld/test/COFF/arm64x-entry.test
@@ -3,12 +3,14 @@ RUN: split-file %s %t.dir && cd %t.dir
 
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows arm64ec-dllmain.s -o arm64ec-dllmain.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows arm64-dllmain.s -o arm64-dllmain.obj
+RUN: llvm-mc -filetype=obj -triple=x86_64-windows amd64-dllmain.s -o amd64-dllmain.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows arm64ec-func.s -o arm64ec-func.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows arm64-func.s -o arm64-func.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows arm64-drectve.s -o arm64ec-drectve.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows arm64-drectve.s -o arm64-drectve.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows %S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows loadconfig-min.s -o loadconfig-min.obj
 
 RUN: lld-link -machine:arm64x -dll -out:out.dll arm64ec-dllmain.obj arm64-dllmain.obj \
 RUN:          loadconfig-arm64.obj loadconfig-arm64ec.obj
@@ -34,10 +36,12 @@ DISASM-NEXT: 180003009: e9 f2 ef ff ff               jmp     0x180002000 <.text+
 DISASM-NEXT: 18000300e: cc                           int3
 DISASM-NEXT: 18000300f: cc                           int3
 
-RUN: llvm-readobj --headers out.dll | FileCheck --check-prefix=READOBJ %s
+RUN: llvm-readobj --headers --coff-load-config out.dll | FileCheck --check-prefix=READOBJ %s
 READOBJ: AddressOfEntryPoint: 0x1000
+READOBJ: AlternateEntryPoint: 0x2000
 READOBJ: HybridObject {
 READOBJ:   AddressOfEntryPoint: 0x3000
+READOBJ:   AlternateEntryPoint: 0x1000
 READOBJ: }
 
 RUN: lld-link -machine:arm64x -dll -out:out2.dll arm64ec-func.obj arm64-func.obj \
@@ -55,6 +59,20 @@ RUN:          loadconfig-arm64.obj loadconfig-arm64ec.obj -entry:func
 RUN: llvm-objdump -d out4.dll | FileCheck --check-prefix=DISASM %s
 RUN: llvm-readobj --headers --coff-load-config out4.dll | FileCheck --check-prefix=READOBJ %s
 
+RUN: lld-link -machine:arm64x -dll -out:out-x86.dll amd64-dllmain.obj arm64-dllmain.obj \
+RUN:          loadconfig-arm64.obj loadconfig-arm64ec.obj
+RUN: llvm-readobj --headers --coff-load-config out-x86.dll | FileCheck --check-prefix=READOBJ-X86 %s
+READOBJ-X86: AddressOfEntryPoint: 0x1000
+READOBJ-X86: AlternateEntryPoint: 0x2000
+READOBJ-X86: HybridObject {
+READOBJ-X86:   AddressOfEntryPoint: 0x2000
+READOBJ-X86:   AlternateEntryPoint: 0x1000
+READOBJ-X86: }
+
+RUN: lld-link -machine:arm64x -dll -out:out-warn.dll arm64ec-dllmain.obj arm64-dllmain.obj \
+RUN:          loadconfig-arm64.obj loadconfig-min.obj 2>&1 | FileCheck --check-prefix=WARN %s
+WARN: lld-link: warning: '__chpe_metadata' is missing for ARM64X target
+
 #--- arm64-dllmain.s
     .section .text,"xr",discard,_DllMainCRTStartup
     .globl _DllMainCRTStartup
@@ -87,6 +105,56 @@ func:
     mov w0, #2
     ret
 
+#--- amd64-dllmain.s
+    .section .text,"xr",discard,_DllMainCRTStartup
+    .globl _DllMainCRTStartup
+    .p2align 2
+_DllMainCRTStartup:
+    movl $3, %eax
+    retq
+
 #--- arm64-drectve.s
 .section .drectve
     .ascii "-entry:func"
+
+#--- loadconfig-min.s
+        .section .rdata,"dr"
+        .globl _load_config_used
+        .p2align 3, 0
+_load_config_used:
+        .word 0x140
+        .fill 0xc4,1,0
+        .xword chpe_metadata
+        .fill 0x70,1,0
+
+        .p2align 3, 0
+chpe_metadata:
+        .word 2
+        .rva __hybrid_code_map
+        .word __hybrid_code_map_count
+        .rva __x64_code_ranges_to_entry_points
+        .rva __arm64x_redirection_metadata
+        .word 0 // __os_arm64x_dispatch_call_no_redirect
+        .word 0 // __os_arm64x_dispatch_ret
+        .word 0 // __os_arm64x_check_call
+        .word 0 // __os_arm64x_check_icall
+        .word 0 // __os_arm64x_check_icall_cfg
+        .rva __arm64x_native_entrypoint
+        .rva __hybrid_auxiliary_iat
+        .word __x64_code_ranges_to_entry_points_count
+        .word __arm64x_redirection_metadata_count
+        .word 0 // __os_arm64x_get_x64_information
+        .word 0 // __os_arm64x_set_x64_information
+        .rva __arm64x_extra_rfe_table
+        .word __arm64x_extra_rfe_table_size
+        .word 0 // __os_arm64x_dispatch_fptr
+        .rva __hybrid_auxiliary_iat_copy
+        .rva __hybrid_auxiliary_delayload_iat
+        .rva __hybrid_auxiliary_delayload_iat_copy
+        .word __hybrid_image_info_bitfield
+        .word 0 // __os_arm64x_helper3
+        .word 0 // __os_arm64x_helper4
+        .word 0 // __os_arm64x_helper5
+        .word 0 // __os_arm64x_helper6
+        .word 0 // __os_arm64x_helper7
+        .word 0 // __os_arm64x_helper8

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly reasonable, although one construct feels a bit unusual.

symtab->findUnderscore("__arm64x_native_entrypoint");
replaceSymbol<DefinedAbsolute>(
nativeEntrySym, ctx, "__arm64x_native_entrypoint",
ctx.config.imageBase + altEntrySym->getRVA());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that addresses have been finalized at this point?

Replacing a symbol with a DefinedAbsolute, which is manually constructed to point at another symbol feels weird. For relocations against a DefinedAbsolute, do we get the base relocation generated?

It somehow feels like it would be more natural to just overwrite this symbol with the contents of the entry point symbol - iirc we do that in a couple other places, so we don't need to do the address calculation manually here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will change it with replaceKeepingName. I think I can just use sizeof(SymbolUnion) here since I know that both symbols are from the symbol table. Thanks!

}
pageHeader->BlockSize = alignTo(pageHeader->BlockSize, sizeof(uint32_t));
relocSize = alignTo(relocSize, sizeof(uint32_t));
pageHeader->BlockSize =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume that we had a nonzero number of relocations here, so that pageHeader was initialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, every ARM64X image will have at least one relocation for the machine field in the PE header. If other types of dynamic relocations are used in the future, this whole chunk code will have to be skipped anyway.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! This variant feels nicer and more idiomatic!

dyn_cast<ECExportThunkChunk>(altEntrySym->getChunk()))
altEntrySym = thunkChunk->target;
symtab->findUnderscore("__arm64x_native_entrypoint")
->replaceKeepingName(altEntrySym, sizeof(SymbolUnion));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dereferencing the output of e.g. findUnderscore() directly feels a bit scary, but I guess this is a symbol that we always define ourselves in the linker, unconditionally when linking EC code, so I guess it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we similarly dereference other symbols with replaceSymbol calls in this function.

Merged, thanks!

@cjacek cjacek merged commit a16adaf into llvm:main Jan 20, 2025
8 checks passed
@cjacek cjacek deleted the chpe-entry branch January 20, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants