Skip to content

[LLD][COFF] Always locate the IAT at the beginning of the .rdata section and align its size to 4KB on ARM64EC. #107588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 8, 2024

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Sep 6, 2024

This mimics the behavior of MSVC's link.exe. My guess is that the reason for this approach is to facilitate tracking runtime IAT modifications. An auxiliary IAT allows bypassing the call checker for imported function calls. It's the OS's responsibility to ensure that, if runtime patching occurs, the auxiliary IAT is reverted to enable call checking. Modifying the IAT is a form of runtime patching, and ensuring that it doesn’t share pages with other data likely helps with tracking accuracy.

Although alignment alone should ensure that the IAT occupies its own pages, placing it at the beginning of the .rdata section might be an optimization. This way, padding is only needed after the IAT, not before. The auxiliary IAT seems to follow a similar idea but is positioned at the end of the .rdata section.

@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld

Author: Jacek Caban (cjacek)

Changes

This mimics the behavior of MSVC's link.exe. My guess is that the reason for this approach is to facilitate tracking runtime IAT modifications. An auxiliary IAT allows bypassing the call checker for imported function calls. It's the OS's responsibility to ensure that, if runtime patching occurs, the auxiliary IAT is reverted to enable call checking. Modifying the IAT is a form of runtime patching, and ensuring that it doesn’t share pages with other data likely helps with tracking accuracy.

Although alignment alone should ensure that the IAT occupies its own pages, placing it at the beginning of the .rdata section might be an optimization. This way, padding is only needed after the IAT, not before. The auxiliary IAT seems to follow a similar idea but is positioned at the end of the .rdata section.


Full diff: https://github.com/llvm/llvm-project/pull/107588.diff

2 Files Affected:

  • (modified) lld/COFF/Writer.cpp (+29)
  • (modified) lld/test/COFF/arm64ec-import.test (+10-5)
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 4a0eed4d00997e..3e218508e0c1b4 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -217,6 +217,7 @@ class Writer {
   void createExportTable();
   void mergeSections();
   void sortECChunks();
+  void appendECImportTables();
   void removeUnusedSections();
   void assignAddresses();
   bool isInRange(uint16_t relType, uint64_t s, uint64_t p, int margin,
@@ -753,6 +754,7 @@ void Writer::run() {
     createExportTable();
     mergeSections();
     sortECChunks();
+    appendECImportTables();
     removeUnusedSections();
     finalizeAddresses();
     removeEmptySections();
@@ -914,6 +916,28 @@ void Writer::addSyntheticIdata() {
   add(".idata$7", idata.dllNames);
 }
 
+void Writer::appendECImportTables() {
+  if (!isArm64EC(ctx.config.machine))
+    return;
+
+  const uint32_t rdata = IMAGE_SCN_CNT_INITIALIZED_DATA | IMAGE_SCN_MEM_READ;
+
+  // IAT is always placed at the begining of .rdata section and its size
+  // is aligned to 4KB. Insert it now, after all merges all done.
+  if (PartialSection *importAddresses = findPartialSection(".idata$5", rdata)) {
+    if (!rdataSec->chunks.empty())
+      rdataSec->chunks.front()->setAlignment(
+          std::max(0x1000u, rdataSec->chunks.front()->getAlignment()));
+    iatSize = alignTo(iatSize, 0x1000);
+
+    rdataSec->chunks.insert(rdataSec->chunks.begin(),
+                            importAddresses->chunks.begin(),
+                            importAddresses->chunks.end());
+    rdataSec->contribSections.insert(rdataSec->contribSections.begin(),
+                                     importAddresses);
+  }
+}
+
 // Locate the first Chunk and size of the import directory list and the
 // IAT.
 void Writer::locateImportTables() {
@@ -1069,6 +1093,11 @@ void Writer::createSections() {
       sortCRTSectionChunks(pSec->chunks);
     }
 
+    // ARM64EC has special placement and alignment requirements for IAT.
+    // Delay adding its chunks to appendECImportTables.
+    if (isArm64EC(ctx.config.machine) && pSec->name == ".idata$5")
+      continue;
+
     OutputSection *sec = createSection(name, outChars);
     for (Chunk *c : pSec->chunks)
       sec->addChunk(c);
diff --git a/lld/test/COFF/arm64ec-import.test b/lld/test/COFF/arm64ec-import.test
index 2a80a30910b5c7..b1c47d785e445b 100644
--- a/lld/test/COFF/arm64ec-import.test
+++ b/lld/test/COFF/arm64ec-import.test
@@ -20,7 +20,7 @@ RUN: llvm-readobj --coff-imports out2.dll | FileCheck --check-prefix=IMPORTS %s
 IMPORTS:      Import {
 IMPORTS-NEXT:   Name: test.dll
 IMPORTS-NEXT:   ImportLookupTableRVA:
-IMPORTS-NEXT:   ImportAddressTableRVA: 0x2258
+IMPORTS-NEXT:   ImportAddressTableRVA: 0x2000
 IMPORTS-NEXT:   Symbol: data (0)
 IMPORTS-NEXT:   Symbol: func (0)
 IMPORTS-NEXT:   Symbol: func2 (0)
@@ -28,7 +28,7 @@ IMPORTS-NEXT: }
 IMPORTS-NEXT: Import {
 IMPORTS-NEXT:   Name: test2.dll
 IMPORTS-NEXT:   ImportLookupTableRVA:
-IMPORTS-NEXT:   ImportAddressTableRVA: 0x2278
+IMPORTS-NEXT:   ImportAddressTableRVA: 0x2020
 IMPORTS-NEXT:   Symbol: t2func (0)
 IMPORTS-NEXT: }
 
@@ -36,12 +36,17 @@ RUN: llvm-objdump -d out.dll | FileCheck --check-prefix=DISASM %s
 RUN: llvm-objdump -d out2.dll | FileCheck --check-prefix=DISASM %s
 
 DISASM:      0000000180001000 <.text>:
-DISASM-NEXT: 180001000: ff 25 5a 12 00 00            jmpq    *0x125a(%rip)           # 0x180002260
+DISASM-NEXT: 180001000: ff 25 02 10 00 00            jmpq    *0x1002(%rip)           # 0x180002008
 
 RUN: llvm-readobj --hex-dump=.test out.dll | FileCheck --check-prefix=TESTSEC %s
 RUN: llvm-readobj --hex-dump=.test out2.dll | FileCheck --check-prefix=TESTSEC %s
-TESTSEC:      0x180004000 60220000 58220000 68220000 78220000
-TESTSEC-NEXT: 0x180004010 00100000
+TESTSEC:      0x180005000 08200000 00200000 10200000 20200000
+TESTSEC-NEXT: 0x180005010 00100000
+
+RUN: llvm-readobj --headers out.dll | FileCheck -check-prefix=HEADERS %s
+HEADERS:  LoadConfigTableRVA: 0x3008
+HEADERS:  IATRVA: 0x2000
+HEADERS:  IATSize: 0x1000
 
 #--- test.s
     .section .test, "r"

@cjacek
Copy link
Contributor Author

cjacek commented Sep 6, 2024

One difference from MSVC's link.exe is that, if .rdata is merged into another section, MSVC places the IATs within that section, whereas this PR hardcodes the .rdata placement. While it might be possible to replicate MSVC's behavior, this would likely require calling assignOutputSectionIndices() earlier.

…ion and align its size to 4KB on ARM64EC.

This mimics the behavior of MSVC's link.exe. My guess is that the reason for this approach
is to facilitate tracking runtime IAT modifications. An auxiliary IAT allows bypassing
the call checker for imported function calls. It's the OS's responsibility to ensure that,
if runtime patching occurs, the auxiliary IAT is reverted to enable call checking.
Modifying the IAT is a form of runtime patching, and ensuring that it doesn’t share
pages with other data likely helps with tracking accuracy.

Although alignment alone should ensure that the IAT occupies its own pages, placing it
at the beginning of the .rdata section might be an optimization. This way, padding is
only needed after the IAT, not before. The auxiliary IAT seems to follow a similar
idea but is positioned at the end of the .rdata section.
@cjacek cjacek force-pushed the arm64ec-iat-placement branch from cf3cda1 to c46a8cd Compare September 6, 2024 13:53
Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I had browsed the previous version of this patch in your branch before, but the version you're submitting here is slightly different - can you elaborate on where you moved it, what implications it has (previously I think you only added one set of chunks somewhere, but now you're adding things to contribSections too) and why you changed your approach?

(I guess normally one doesn't have to answer about code changes prior to submission, but I somewhat felt that the previous version that I've seen felt more straightforward, but I guess this one is more complete/correct somehow?)

@cjacek
Copy link
Contributor Author

cjacek commented Sep 7, 2024

Yes, I believe the new version is more correct, though the practical difference may be minimal.

The main distinction lies in how .idata sections provided by object files are handled - such as the long-form import libraries generated by binutils, as opposed to synthesized chunks generated from import files. I tested this with the MSVC linker, and it places the .idata$5 contents together with the synthesized ones. To match this behavior, I needed to adjust the logic to operate based on partial sections rather than working directly with IdataContents.

I tested this by using an x86_64 import library generated by binutils. Since such a library is unaware of the auxiliary IAT, the resulting binary isn't quite correct and is unlikely to function properly. However, it would be possible to create an ARM64EC-aware library that handles things correctly. This would require additional .idata$9 and .idata$a sections (for the auxiliary IAT and its copy), along with the necessary thunks to populate them. While I don't see a practical need for this, the new approach supports it, and MSVC's link.exe seems to handle it as well.

By the way, when such invalid import libraries are used, we may want to emit some diagnostics. This kind of mistake is not obvious and could be easily made in a MinGW environment. MSVC handles it by checking if the IAT aligns with the auxiliary IAT, and if not, it emits a warning like:

LINK : warning LNK4308: the auxiliary import address table is not properly aligned with the primary import address table. This may have negative perf impact.

(I think this warning is somewhat misleading - it’s not just a performance issue; in my test case, it would actually lead to crashes.)

As for the partial section insertion, my goal was to make the LLD state as close to the "regular" chunk addition process as possible. Since we use contributing sections when emitting PDBs, this should make that process more accurate. MSVC also emits partial sections in the map file (which is how I discovered the section names for auxiliary IATs). While LLD doesn’t do this yet, if we ever decide to, the code from this PR will be ready for it.

@mstorsjo
Copy link
Member

mstorsjo commented Sep 7, 2024

Thanks! That sounds like good reasons - the less we assume about whether idata is synthesized or sourced from long import libraries, the better.

@cjacek cjacek merged commit dec0781 into llvm:main Sep 8, 2024
8 checks passed
@cjacek cjacek deleted the arm64ec-iat-placement branch September 8, 2024 13:36
@cjacek
Copy link
Contributor Author

cjacek commented Sep 8, 2024

Pushed, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants