Skip to content

[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) #121516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 12, 2025

Conversation

dpaoliello
Copy link
Contributor

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.

@llvmbot llvmbot added mc Machine (object) code llvm:SelectionDAG SelectionDAGISel as well labels Jan 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-platform-windows

Author: Daniel Paoliello (dpaoliello)

Changes

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.


Patch is 28.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121516.diff

20 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+18)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+14)
  • (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+5)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+3)
  • (modified) llvm/include/llvm/MC/MCWinCOFFObjectWriter.h (+2)
  • (modified) llvm/include/llvm/MC/MCWinCOFFStreamer.h (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+3)
  • (modified) llvm/lib/MC/MCAsmStreamer.cpp (+9)
  • (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+5)
  • (modified) llvm/lib/MC/MCParser/COFFAsmParser.cpp (+21)
  • (modified) llvm/lib/MC/MCStreamer.cpp (+2)
  • (modified) llvm/lib/MC/MCWinCOFFStreamer.cpp (+9)
  • (modified) llvm/lib/MC/WinCOFFObjectWriter.cpp (+78)
  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+42)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-4)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization-nocalls.ll (+18)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization.ll (+36)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization-no-section.s (+9)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization.s (+62)
  • (added) llvm/test/MC/COFF/win-import-call-optimization-not-supported.s (+13)
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index d696add8a1af53..520f1745de2979 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -354,6 +354,11 @@ class LLVM_ABI MachineFunction {
   /// a table of valid targets for Windows EHCont Guard.
   std::vector<MCSymbol *> CatchretTargets;
 
+  /// Mapping of call instruction to the global value and target flags that it
+  /// calls, if applicable.
+  DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
+      CalledGlobalsMap;
+
   /// \name Exception Handling
   /// \{
 
@@ -1182,6 +1187,19 @@ class LLVM_ABI MachineFunction {
     CatchretTargets.push_back(Target);
   }
 
+  /// Tries to get the global and target flags for a call site, if the
+  /// instruction is a call to a global.
+  std::pair<const GlobalValue *, unsigned>
+  tryGetCalledGlobal(const MachineInstr *MI) const {
+    return CalledGlobalsMap.lookup(MI);
+  }
+
+  /// Notes the global and target flags for a call site.
+  void addCalledGlobal(const MachineInstr *MI,
+                       std::pair<const GlobalValue *, unsigned> Details) {
+    CalledGlobalsMap.insert({MI, Details});
+  }
+
   /// \name Exception Handling
   /// \{
 
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ff7caec41855fd..b31ad11c3ee0ee 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -293,6 +293,7 @@ class SelectionDAG {
     MDNode *HeapAllocSite = nullptr;
     MDNode *PCSections = nullptr;
     MDNode *MMRA = nullptr;
+    std::pair<const GlobalValue *, unsigned> CalledGlobal{};
     bool NoMerge = false;
   };
   /// Out-of-line extra information for SDNodes.
@@ -2373,6 +2374,19 @@ class SelectionDAG {
     auto It = SDEI.find(Node);
     return It != SDEI.end() ? It->second.MMRA : nullptr;
   }
+  /// Set CalledGlobal to be associated with Node.
+  void addCalledGlobal(const SDNode *Node, const GlobalValue *GV,
+                       unsigned OpFlags) {
+    SDEI[Node].CalledGlobal = {GV, OpFlags};
+  }
+  /// Return CalledGlobal associated with Node, or a nullopt if none exists.
+  std::optional<std::pair<const GlobalValue *, unsigned>>
+  getCalledGlobal(const SDNode *Node) {
+    auto I = SDEI.find(Node);
+    return I != SDEI.end()
+               ? std::make_optional(std::move(I->second).CalledGlobal)
+               : std::nullopt;
+  }
   /// Set NoMergeSiteInfo to be associated with Node if NoMerge is true.
   void addNoMergeSiteInfo(const SDNode *Node, bool NoMerge) {
     if (NoMerge)
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h
index e2a2c84e47910b..fb575fe721015c 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -73,6 +73,10 @@ class MCObjectFileInfo {
   /// to emit them into.
   MCSection *CompactUnwindSection = nullptr;
 
+  /// If import call optimization is supported by the target, this is the
+  /// section to emit import call data to.
+  MCSection *ImportCallSection = nullptr;
+
   // Dwarf sections for debug info.  If a target supports debug info, these must
   // be set.
   MCSection *DwarfAbbrevSection = nullptr;
@@ -269,6 +273,7 @@ class MCObjectFileInfo {
   MCSection *getBSSSection() const { return BSSSection; }
   MCSection *getReadOnlySection() const { return ReadOnlySection; }
   MCSection *getLSDASection() const { return LSDASection; }
+  MCSection *getImportCallSection() const { return ImportCallSection; }
   MCSection *getCompactUnwindSection() const { return CompactUnwindSection; }
   MCSection *getDwarfAbbrevSection() const { return DwarfAbbrevSection; }
   MCSection *getDwarfInfoSection() const { return DwarfInfoSection; }
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 21da4dac4872b4..c82ce4428ed09c 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -569,6 +569,9 @@ class MCStreamer {
   /// \param Symbol - Symbol the image relative relocation should point to.
   virtual void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset);
 
+  /// Emits an import call directive, used to build the import call table.
+  virtual void emitCOFFImpCall(MCSymbol const *Symbol);
+
   /// Emits an lcomm directive with XCOFF csect information.
   ///
   /// \param LabelSym - Label on the block of storage.
diff --git a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
index a4ede61e45099d..00a132706879f2 100644
--- a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
@@ -72,6 +72,8 @@ class WinCOFFObjectWriter final : public MCObjectWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue) override;
   uint64_t writeObject(MCAssembler &Asm) override;
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const;
 };
 
 /// Construct a new Win COFF writer instance.
diff --git a/llvm/include/llvm/MC/MCWinCOFFStreamer.h b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
index 5c39d80538944b..2318d1b8e0a223 100644
--- a/llvm/include/llvm/MC/MCWinCOFFStreamer.h
+++ b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
@@ -58,6 +58,7 @@ class MCWinCOFFStreamer : public MCObjectStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
   void emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 26fc75c0578ec2..6744e7cd2ecfcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -908,6 +908,9 @@ EmitSchedule(MachineBasicBlock::iterator &InsertPos) {
         It->setMMRAMetadata(MF, MMRA);
     }
 
+    if (auto CalledGlobal = DAG->getCalledGlobal(Node))
+      MF.addCalledGlobal(MI, *CalledGlobal);
+
     return MI;
   };
 
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 01fe11ed205017..01903e2c1335b9 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -209,6 +209,7 @@ class MCAsmStreamer final : public MCStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitXCOFFLocalCommonSymbol(MCSymbol *LabelSym, uint64_t Size,
                                   MCSymbol *CsectSym, Align Alignment) override;
   void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol,
@@ -893,6 +894,14 @@ void MCAsmStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+  OS << "\t.impcall\t";
+  Symbol->print(OS, MAI);
+  EmitEOL();
+}
+
 // We need an XCOFF-specific version of this directive as the AIX syntax
 // requires a QualName argument identifying the csect name and storage mapping
 // class to appear before the alignment if we are specifying it.
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index f37e138edc36b1..150e38a94db6a6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -596,6 +596,11 @@ void MCObjectFileInfo::initCOFFMCObjectFileInfo(const Triple &T) {
                                           COFF::IMAGE_SCN_MEM_READ);
   }
 
+  if (T.getArch() == Triple::aarch64) {
+    ImportCallSection =
+        Ctx->getCOFFSection(".impcall", COFF::IMAGE_SCN_LNK_INFO);
+  }
+
   // Debug info.
   COFFDebugSymbolsSection =
       Ctx->getCOFFSection(".debug$S", (COFF::IMAGE_SCN_MEM_DISCARDABLE |
diff --git a/llvm/lib/MC/MCParser/COFFAsmParser.cpp b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
index 4d95a720852835..48f03ec0d3a847 100644
--- a/llvm/lib/MC/MCParser/COFFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
@@ -12,6 +12,7 @@
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParserExtension.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -70,6 +71,7 @@ class COFFAsmParser : public MCAsmParserExtension {
     addDirectiveHandler<&COFFAsmParser::parseDirectiveSymbolAttribute>(
         ".weak_anti_dep");
     addDirectiveHandler<&COFFAsmParser::parseDirectiveCGProfile>(".cg_profile");
+    addDirectiveHandler<&COFFAsmParser::parseDirectiveImpCall>(".impcall");
 
     // Win64 EH directives.
     addDirectiveHandler<&COFFAsmParser::parseSEHDirectiveStartProc>(
@@ -126,6 +128,7 @@ class COFFAsmParser : public MCAsmParserExtension {
   bool parseDirectiveLinkOnce(StringRef, SMLoc);
   bool parseDirectiveRVA(StringRef, SMLoc);
   bool parseDirectiveCGProfile(StringRef, SMLoc);
+  bool parseDirectiveImpCall(StringRef, SMLoc);
 
   // Win64 EH directives.
   bool parseSEHDirectiveStartProc(StringRef, SMLoc);
@@ -577,6 +580,24 @@ bool COFFAsmParser::parseDirectiveSymIdx(StringRef, SMLoc) {
   return false;
 }
 
+bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
+  if (!getContext().getObjectFileInfo()->getImportCallSection())
+    return TokError("target doesn't have an import call section");
+
+  StringRef SymbolID;
+  if (getParser().parseIdentifier(SymbolID))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("unexpected token in directive");
+
+  MCSymbol *Symbol = getContext().getOrCreateSymbol(SymbolID);
+
+  Lex();
+  getStreamer().emitCOFFImpCall(Symbol);
+  return false;
+}
+
 /// ::= [ identifier ]
 bool COFFAsmParser::parseCOMDATType(COFF::COMDATType &Type) {
   StringRef TypeId = getTok().getIdentifier();
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index ccf65df150e786..ee26fc07313f18 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1023,6 +1023,8 @@ void MCStreamer::emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) {}
 
 void MCStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {}
 
+void MCStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {}
+
 /// EmitRawText - If this file is backed by an assembly streamer, this dumps
 /// the specified string in the output .s file.  This capability is
 /// indicated by the hasRawTextSupport() predicate.
diff --git a/llvm/lib/MC/MCWinCOFFStreamer.cpp b/llvm/lib/MC/MCWinCOFFStreamer.cpp
index 395d4db3103d78..71e31f3288c001 100644
--- a/llvm/lib/MC/MCWinCOFFStreamer.cpp
+++ b/llvm/lib/MC/MCWinCOFFStreamer.cpp
@@ -280,6 +280,15 @@ void MCWinCOFFStreamer::emitCOFFImgRel32(const MCSymbol *Symbol,
   DF->appendContents(4, 0);
 }
 
+void MCWinCOFFStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+
+  auto *DF = getOrCreateDataFragment();
+  getAssembler().registerSymbol(*Symbol);
+  getWriter().recordImportCall(*DF, Symbol);
+}
+
 void MCWinCOFFStreamer::emitCommonSymbol(MCSymbol *S, uint64_t Size,
                                          Align ByteAlignment) {
   auto *Symbol = cast<MCSymbolCOFF>(S);
diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp
index 09d2b08e43050f..527464fa54ce02 100644
--- a/llvm/lib/MC/WinCOFFObjectWriter.cpp
+++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
 #include "llvm/MC/MCFragment.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -147,6 +148,13 @@ class llvm::WinCOFFWriter {
   bool UseBigObj;
   bool UseOffsetLabels = false;
 
+  struct ImportCall {
+    unsigned CallsiteOffset;
+    const MCSymbol *CalledSymbol;
+  };
+  using importcall_map = MapVector<MCSection *, std::vector<ImportCall>>;
+  importcall_map SectionToImportCallsMap;
+
 public:
   enum DwoMode {
     AllSections,
@@ -163,6 +171,11 @@ class llvm::WinCOFFWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue);
   uint64_t writeObject(MCAssembler &Asm);
+  void generateAArch64ImportCallSection(llvm::MCAssembler &Asm);
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const {
+    return !SectionToImportCallsMap.empty();
+  }
 
 private:
   COFFSymbol *createSymbol(StringRef Name);
@@ -1097,6 +1110,17 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
     }
   }
 
+  // Create the contents of the import call section.
+  if (hasRecordedImportCalls()) {
+    switch (Asm.getContext().getTargetTriple().getArch()) {
+    case Triple::aarch64:
+      generateAArch64ImportCallSection(Asm);
+      break;
+    default:
+      llvm_unreachable("unsupported architecture for import call section");
+    }
+  }
+
   assignFileOffsets(Asm);
 
   // MS LINK expects to be able to use this timestamp to implement their
@@ -1143,6 +1167,51 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
   return W.OS.tell() - StartOffset;
 }
 
+void llvm::WinCOFFWriter::generateAArch64ImportCallSection(
+    llvm::MCAssembler &Asm) {
+  auto *ImpCallSection =
+      Asm.getContext().getObjectFileInfo()->getImportCallSection();
+
+  if (!SectionMap.contains(ImpCallSection)) {
+    Asm.getContext().reportError(SMLoc(),
+                                 ".impcall directives were used, but no "
+                                 "existing .impcall section exists");
+    return;
+  }
+
+  auto *Frag = cast<MCDataFragment>(ImpCallSection->curFragList()->Head);
+  raw_svector_ostream OS(Frag->getContents());
+
+  // Layout of this section is:
+  // Per section that contains calls to imported functions:
+  //  uint32_t SectionSize: Size in bytes for information in this section.
+  //  uint32_t Section Number
+  //  Per call to imported function in section:
+  //    uint32_t Kind: the kind of imported function.
+  //    uint32_t BranchOffset: the offset of the branch instruction in its
+  //                            parent section.
+  //    uint32_t TargetSymbolId: the symbol id of the called function.
+
+  // Per section that contained eligible targets...
+  for (auto &[Section, Targets] : SectionToImportCallsMap) {
+    unsigned SectionSize = sizeof(uint32_t) * (2 + 3 * Targets.size());
+    support::endian::write(OS, SectionSize, W.Endian);
+    support::endian::write(OS, SectionMap.at(Section)->Number, W.Endian);
+    for (auto &[BranchOffset, TargetSymbol] : Targets) {
+      // Kind is always IMAGE_REL_ARM64_DYNAMIC_IMPORT_CALL (0x13).
+      support::endian::write(OS, 0x13, W.Endian);
+      support::endian::write(OS, BranchOffset, W.Endian);
+      support::endian::write(OS, TargetSymbol->getIndex(), W.Endian);
+    }
+  }
+}
+
+void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
+                                     const MCSymbol *Symbol) {
+  auto &SectionData = SectionToImportCallsMap[FB.getParent()];
+  SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
+}
+
 //------------------------------------------------------------------------------
 // WinCOFFObjectWriter class implementation
 
@@ -1194,6 +1263,15 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm) {
   return TotalSize;
 }
 
+void WinCOFFObjectWriter::recordImportCall(const MCDataFragment &FB,
+                                           const MCSymbol *Symbol) {
+  ObjWriter->recordImportCall(FB, Symbol);
+}
+
+bool WinCOFFObjectWriter::hasRecordedImportCalls() const {
+  return ObjWriter->hasRecordedImportCalls();
+}
+
 MCWinCOFFObjectTargetWriter::MCWinCOFFObjectTargetWriter(unsigned Machine_)
     : Machine(Machine_) {}
 
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 9bec782ca8ce97..4c03f876465051 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -77,6 +77,11 @@ static cl::opt<PtrauthCheckMode> PtrauthAuthChecks(
     cl::desc("Check pointer authentication auth/resign failures"),
     cl::init(Default));
 
+static cl::opt<bool> EnableImportCallOptimization(
+    "aarch64-win-import-call-optimization", cl::Hidden,
+    cl::desc("Enable import call optimization for AArch64 Windows"),
+    cl::init(false));
+
 #define DEBUG_TYPE "asm-printer"
 
 namespace {
@@ -293,6 +298,11 @@ class AArch64AsmPrinter : public AsmPrinter {
                               MCSymbol *LazyPointer) override;
   void emitMachOIFuncStubHelperBody(Module &M, const GlobalIFunc &GI,
                                     MCSymbol *LazyPointer) override;
+
+  /// Checks if this instruction is part of a sequence that is eligle for import
+  /// call optimization and, if so, records it to be emitted in the import call
+  /// section.
+  void recordIfImportCall(const MachineInstr *BranchInst);
 };
 
 } // end anonymous namespace
@@ -921,6 +931,15 @@ void AArch64AsmPrinter::emitEndOfAsmFile(Module &M) {
   // Emit stack and fault map information.
   FM.serializeToFaultMapSection();
 
+  // If import call optimization is enabled, emit the appropriate section.
+  // We do this whether or not we recorded any import calls.
+  if (EnableImportCallOptimization && TT.isOSBinFormatCOFF()) {
+    OutStreamer->switchSection(getObjFileLowering().getImportCallSection());
+
+    // Section always starts with some magic.
+    constexpr char ImpCallMagic[12] = "Imp_Call_V1";
+    OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
+  }
 }
 
 void AArch64AsmPrinter::emitLOHs() {
@@ -2693,6 +2712,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   case AArch64::TCRETURNrinotx16:
   case AArch64::TCRETURNriALL: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCInst TmpInst;
     TmpInst.setOpcode(AArch64::BR);
@@ -2702,6 +2722,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   }
   case AArch64::TCRETURNdi: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCOperand Dest;
     MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
@@ -3035,6 +3056,14 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     TS->emitARM64WinCFISaveAnyRegQPX(MI->getOperand(0).getImm(),
                                      -MI->getOperand(2).getImm());
     return;
+
+  case AArch64::BLR:
+  case AArch64::BR:
+    recordIfImportCall(MI);
+    MCInst TmpInst;
+    MCInstLowering.Lower(MI, TmpInst);
+    EmitToStreamer(*OutStreamer, TmpInst);
+    return;
   }
 
   // Finally, do the automated lowerings for everything else.
@@ -3043,6 +3072,19 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, TmpInst);
 }
 
+void AArch64AsmPrinter::recordIfImportCall(
+    const llvm::MachineInstr *BranchInst) {
+  if (!EnableImportCallOptimization ||
+      !TM.getTargetTriple().isOSBinFormatCOFF())
+    return;
+
+  auto [GV, OpFlags] = BranchInst->getMF()->tryGetCalledGlobal(BranchInst);
+  if (GV && GV->hasDLLImportStorageClass()) {
+    OutStreamer->emitCOFFImpCall(
+        MCInstLowering.GetGlobalValueSymbol(GV, OpFlags));
+  }
+}
+
 void AArch64AsmPrinter::emitMachOIFuncStubBody(Module &M, const GlobalIFunc &GI,
                                                MCSymbol *LazyPointer) {
   // _ifunc:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 070163a5fb29...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Daniel Paoliello (dpaoliello)

Changes

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.


Patch is 28.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121516.diff

20 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+18)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+14)
  • (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+5)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+3)
  • (modified) llvm/include/llvm/MC/MCWinCOFFObjectWriter.h (+2)
  • (modified) llvm/include/llvm/MC/MCWinCOFFStreamer.h (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+3)
  • (modified) llvm/lib/MC/MCAsmStreamer.cpp (+9)
  • (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+5)
  • (modified) llvm/lib/MC/MCParser/COFFAsmParser.cpp (+21)
  • (modified) llvm/lib/MC/MCStreamer.cpp (+2)
  • (modified) llvm/lib/MC/MCWinCOFFStreamer.cpp (+9)
  • (modified) llvm/lib/MC/WinCOFFObjectWriter.cpp (+78)
  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+42)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-4)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization-nocalls.ll (+18)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization.ll (+36)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization-no-section.s (+9)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization.s (+62)
  • (added) llvm/test/MC/COFF/win-import-call-optimization-not-supported.s (+13)
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index d696add8a1af53..520f1745de2979 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -354,6 +354,11 @@ class LLVM_ABI MachineFunction {
   /// a table of valid targets for Windows EHCont Guard.
   std::vector<MCSymbol *> CatchretTargets;
 
+  /// Mapping of call instruction to the global value and target flags that it
+  /// calls, if applicable.
+  DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
+      CalledGlobalsMap;
+
   /// \name Exception Handling
   /// \{
 
@@ -1182,6 +1187,19 @@ class LLVM_ABI MachineFunction {
     CatchretTargets.push_back(Target);
   }
 
+  /// Tries to get the global and target flags for a call site, if the
+  /// instruction is a call to a global.
+  std::pair<const GlobalValue *, unsigned>
+  tryGetCalledGlobal(const MachineInstr *MI) const {
+    return CalledGlobalsMap.lookup(MI);
+  }
+
+  /// Notes the global and target flags for a call site.
+  void addCalledGlobal(const MachineInstr *MI,
+                       std::pair<const GlobalValue *, unsigned> Details) {
+    CalledGlobalsMap.insert({MI, Details});
+  }
+
   /// \name Exception Handling
   /// \{
 
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ff7caec41855fd..b31ad11c3ee0ee 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -293,6 +293,7 @@ class SelectionDAG {
     MDNode *HeapAllocSite = nullptr;
     MDNode *PCSections = nullptr;
     MDNode *MMRA = nullptr;
+    std::pair<const GlobalValue *, unsigned> CalledGlobal{};
     bool NoMerge = false;
   };
   /// Out-of-line extra information for SDNodes.
@@ -2373,6 +2374,19 @@ class SelectionDAG {
     auto It = SDEI.find(Node);
     return It != SDEI.end() ? It->second.MMRA : nullptr;
   }
+  /// Set CalledGlobal to be associated with Node.
+  void addCalledGlobal(const SDNode *Node, const GlobalValue *GV,
+                       unsigned OpFlags) {
+    SDEI[Node].CalledGlobal = {GV, OpFlags};
+  }
+  /// Return CalledGlobal associated with Node, or a nullopt if none exists.
+  std::optional<std::pair<const GlobalValue *, unsigned>>
+  getCalledGlobal(const SDNode *Node) {
+    auto I = SDEI.find(Node);
+    return I != SDEI.end()
+               ? std::make_optional(std::move(I->second).CalledGlobal)
+               : std::nullopt;
+  }
   /// Set NoMergeSiteInfo to be associated with Node if NoMerge is true.
   void addNoMergeSiteInfo(const SDNode *Node, bool NoMerge) {
     if (NoMerge)
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h
index e2a2c84e47910b..fb575fe721015c 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -73,6 +73,10 @@ class MCObjectFileInfo {
   /// to emit them into.
   MCSection *CompactUnwindSection = nullptr;
 
+  /// If import call optimization is supported by the target, this is the
+  /// section to emit import call data to.
+  MCSection *ImportCallSection = nullptr;
+
   // Dwarf sections for debug info.  If a target supports debug info, these must
   // be set.
   MCSection *DwarfAbbrevSection = nullptr;
@@ -269,6 +273,7 @@ class MCObjectFileInfo {
   MCSection *getBSSSection() const { return BSSSection; }
   MCSection *getReadOnlySection() const { return ReadOnlySection; }
   MCSection *getLSDASection() const { return LSDASection; }
+  MCSection *getImportCallSection() const { return ImportCallSection; }
   MCSection *getCompactUnwindSection() const { return CompactUnwindSection; }
   MCSection *getDwarfAbbrevSection() const { return DwarfAbbrevSection; }
   MCSection *getDwarfInfoSection() const { return DwarfInfoSection; }
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 21da4dac4872b4..c82ce4428ed09c 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -569,6 +569,9 @@ class MCStreamer {
   /// \param Symbol - Symbol the image relative relocation should point to.
   virtual void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset);
 
+  /// Emits an import call directive, used to build the import call table.
+  virtual void emitCOFFImpCall(MCSymbol const *Symbol);
+
   /// Emits an lcomm directive with XCOFF csect information.
   ///
   /// \param LabelSym - Label on the block of storage.
diff --git a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
index a4ede61e45099d..00a132706879f2 100644
--- a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
@@ -72,6 +72,8 @@ class WinCOFFObjectWriter final : public MCObjectWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue) override;
   uint64_t writeObject(MCAssembler &Asm) override;
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const;
 };
 
 /// Construct a new Win COFF writer instance.
diff --git a/llvm/include/llvm/MC/MCWinCOFFStreamer.h b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
index 5c39d80538944b..2318d1b8e0a223 100644
--- a/llvm/include/llvm/MC/MCWinCOFFStreamer.h
+++ b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
@@ -58,6 +58,7 @@ class MCWinCOFFStreamer : public MCObjectStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
   void emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 26fc75c0578ec2..6744e7cd2ecfcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -908,6 +908,9 @@ EmitSchedule(MachineBasicBlock::iterator &InsertPos) {
         It->setMMRAMetadata(MF, MMRA);
     }
 
+    if (auto CalledGlobal = DAG->getCalledGlobal(Node))
+      MF.addCalledGlobal(MI, *CalledGlobal);
+
     return MI;
   };
 
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 01fe11ed205017..01903e2c1335b9 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -209,6 +209,7 @@ class MCAsmStreamer final : public MCStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitXCOFFLocalCommonSymbol(MCSymbol *LabelSym, uint64_t Size,
                                   MCSymbol *CsectSym, Align Alignment) override;
   void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol,
@@ -893,6 +894,14 @@ void MCAsmStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+  OS << "\t.impcall\t";
+  Symbol->print(OS, MAI);
+  EmitEOL();
+}
+
 // We need an XCOFF-specific version of this directive as the AIX syntax
 // requires a QualName argument identifying the csect name and storage mapping
 // class to appear before the alignment if we are specifying it.
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index f37e138edc36b1..150e38a94db6a6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -596,6 +596,11 @@ void MCObjectFileInfo::initCOFFMCObjectFileInfo(const Triple &T) {
                                           COFF::IMAGE_SCN_MEM_READ);
   }
 
+  if (T.getArch() == Triple::aarch64) {
+    ImportCallSection =
+        Ctx->getCOFFSection(".impcall", COFF::IMAGE_SCN_LNK_INFO);
+  }
+
   // Debug info.
   COFFDebugSymbolsSection =
       Ctx->getCOFFSection(".debug$S", (COFF::IMAGE_SCN_MEM_DISCARDABLE |
diff --git a/llvm/lib/MC/MCParser/COFFAsmParser.cpp b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
index 4d95a720852835..48f03ec0d3a847 100644
--- a/llvm/lib/MC/MCParser/COFFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
@@ -12,6 +12,7 @@
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParserExtension.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -70,6 +71,7 @@ class COFFAsmParser : public MCAsmParserExtension {
     addDirectiveHandler<&COFFAsmParser::parseDirectiveSymbolAttribute>(
         ".weak_anti_dep");
     addDirectiveHandler<&COFFAsmParser::parseDirectiveCGProfile>(".cg_profile");
+    addDirectiveHandler<&COFFAsmParser::parseDirectiveImpCall>(".impcall");
 
     // Win64 EH directives.
     addDirectiveHandler<&COFFAsmParser::parseSEHDirectiveStartProc>(
@@ -126,6 +128,7 @@ class COFFAsmParser : public MCAsmParserExtension {
   bool parseDirectiveLinkOnce(StringRef, SMLoc);
   bool parseDirectiveRVA(StringRef, SMLoc);
   bool parseDirectiveCGProfile(StringRef, SMLoc);
+  bool parseDirectiveImpCall(StringRef, SMLoc);
 
   // Win64 EH directives.
   bool parseSEHDirectiveStartProc(StringRef, SMLoc);
@@ -577,6 +580,24 @@ bool COFFAsmParser::parseDirectiveSymIdx(StringRef, SMLoc) {
   return false;
 }
 
+bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
+  if (!getContext().getObjectFileInfo()->getImportCallSection())
+    return TokError("target doesn't have an import call section");
+
+  StringRef SymbolID;
+  if (getParser().parseIdentifier(SymbolID))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("unexpected token in directive");
+
+  MCSymbol *Symbol = getContext().getOrCreateSymbol(SymbolID);
+
+  Lex();
+  getStreamer().emitCOFFImpCall(Symbol);
+  return false;
+}
+
 /// ::= [ identifier ]
 bool COFFAsmParser::parseCOMDATType(COFF::COMDATType &Type) {
   StringRef TypeId = getTok().getIdentifier();
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index ccf65df150e786..ee26fc07313f18 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1023,6 +1023,8 @@ void MCStreamer::emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) {}
 
 void MCStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {}
 
+void MCStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {}
+
 /// EmitRawText - If this file is backed by an assembly streamer, this dumps
 /// the specified string in the output .s file.  This capability is
 /// indicated by the hasRawTextSupport() predicate.
diff --git a/llvm/lib/MC/MCWinCOFFStreamer.cpp b/llvm/lib/MC/MCWinCOFFStreamer.cpp
index 395d4db3103d78..71e31f3288c001 100644
--- a/llvm/lib/MC/MCWinCOFFStreamer.cpp
+++ b/llvm/lib/MC/MCWinCOFFStreamer.cpp
@@ -280,6 +280,15 @@ void MCWinCOFFStreamer::emitCOFFImgRel32(const MCSymbol *Symbol,
   DF->appendContents(4, 0);
 }
 
+void MCWinCOFFStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+
+  auto *DF = getOrCreateDataFragment();
+  getAssembler().registerSymbol(*Symbol);
+  getWriter().recordImportCall(*DF, Symbol);
+}
+
 void MCWinCOFFStreamer::emitCommonSymbol(MCSymbol *S, uint64_t Size,
                                          Align ByteAlignment) {
   auto *Symbol = cast<MCSymbolCOFF>(S);
diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp
index 09d2b08e43050f..527464fa54ce02 100644
--- a/llvm/lib/MC/WinCOFFObjectWriter.cpp
+++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
 #include "llvm/MC/MCFragment.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -147,6 +148,13 @@ class llvm::WinCOFFWriter {
   bool UseBigObj;
   bool UseOffsetLabels = false;
 
+  struct ImportCall {
+    unsigned CallsiteOffset;
+    const MCSymbol *CalledSymbol;
+  };
+  using importcall_map = MapVector<MCSection *, std::vector<ImportCall>>;
+  importcall_map SectionToImportCallsMap;
+
 public:
   enum DwoMode {
     AllSections,
@@ -163,6 +171,11 @@ class llvm::WinCOFFWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue);
   uint64_t writeObject(MCAssembler &Asm);
+  void generateAArch64ImportCallSection(llvm::MCAssembler &Asm);
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const {
+    return !SectionToImportCallsMap.empty();
+  }
 
 private:
   COFFSymbol *createSymbol(StringRef Name);
@@ -1097,6 +1110,17 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
     }
   }
 
+  // Create the contents of the import call section.
+  if (hasRecordedImportCalls()) {
+    switch (Asm.getContext().getTargetTriple().getArch()) {
+    case Triple::aarch64:
+      generateAArch64ImportCallSection(Asm);
+      break;
+    default:
+      llvm_unreachable("unsupported architecture for import call section");
+    }
+  }
+
   assignFileOffsets(Asm);
 
   // MS LINK expects to be able to use this timestamp to implement their
@@ -1143,6 +1167,51 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
   return W.OS.tell() - StartOffset;
 }
 
+void llvm::WinCOFFWriter::generateAArch64ImportCallSection(
+    llvm::MCAssembler &Asm) {
+  auto *ImpCallSection =
+      Asm.getContext().getObjectFileInfo()->getImportCallSection();
+
+  if (!SectionMap.contains(ImpCallSection)) {
+    Asm.getContext().reportError(SMLoc(),
+                                 ".impcall directives were used, but no "
+                                 "existing .impcall section exists");
+    return;
+  }
+
+  auto *Frag = cast<MCDataFragment>(ImpCallSection->curFragList()->Head);
+  raw_svector_ostream OS(Frag->getContents());
+
+  // Layout of this section is:
+  // Per section that contains calls to imported functions:
+  //  uint32_t SectionSize: Size in bytes for information in this section.
+  //  uint32_t Section Number
+  //  Per call to imported function in section:
+  //    uint32_t Kind: the kind of imported function.
+  //    uint32_t BranchOffset: the offset of the branch instruction in its
+  //                            parent section.
+  //    uint32_t TargetSymbolId: the symbol id of the called function.
+
+  // Per section that contained eligible targets...
+  for (auto &[Section, Targets] : SectionToImportCallsMap) {
+    unsigned SectionSize = sizeof(uint32_t) * (2 + 3 * Targets.size());
+    support::endian::write(OS, SectionSize, W.Endian);
+    support::endian::write(OS, SectionMap.at(Section)->Number, W.Endian);
+    for (auto &[BranchOffset, TargetSymbol] : Targets) {
+      // Kind is always IMAGE_REL_ARM64_DYNAMIC_IMPORT_CALL (0x13).
+      support::endian::write(OS, 0x13, W.Endian);
+      support::endian::write(OS, BranchOffset, W.Endian);
+      support::endian::write(OS, TargetSymbol->getIndex(), W.Endian);
+    }
+  }
+}
+
+void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
+                                     const MCSymbol *Symbol) {
+  auto &SectionData = SectionToImportCallsMap[FB.getParent()];
+  SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
+}
+
 //------------------------------------------------------------------------------
 // WinCOFFObjectWriter class implementation
 
@@ -1194,6 +1263,15 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm) {
   return TotalSize;
 }
 
+void WinCOFFObjectWriter::recordImportCall(const MCDataFragment &FB,
+                                           const MCSymbol *Symbol) {
+  ObjWriter->recordImportCall(FB, Symbol);
+}
+
+bool WinCOFFObjectWriter::hasRecordedImportCalls() const {
+  return ObjWriter->hasRecordedImportCalls();
+}
+
 MCWinCOFFObjectTargetWriter::MCWinCOFFObjectTargetWriter(unsigned Machine_)
     : Machine(Machine_) {}
 
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 9bec782ca8ce97..4c03f876465051 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -77,6 +77,11 @@ static cl::opt<PtrauthCheckMode> PtrauthAuthChecks(
     cl::desc("Check pointer authentication auth/resign failures"),
     cl::init(Default));
 
+static cl::opt<bool> EnableImportCallOptimization(
+    "aarch64-win-import-call-optimization", cl::Hidden,
+    cl::desc("Enable import call optimization for AArch64 Windows"),
+    cl::init(false));
+
 #define DEBUG_TYPE "asm-printer"
 
 namespace {
@@ -293,6 +298,11 @@ class AArch64AsmPrinter : public AsmPrinter {
                               MCSymbol *LazyPointer) override;
   void emitMachOIFuncStubHelperBody(Module &M, const GlobalIFunc &GI,
                                     MCSymbol *LazyPointer) override;
+
+  /// Checks if this instruction is part of a sequence that is eligle for import
+  /// call optimization and, if so, records it to be emitted in the import call
+  /// section.
+  void recordIfImportCall(const MachineInstr *BranchInst);
 };
 
 } // end anonymous namespace
@@ -921,6 +931,15 @@ void AArch64AsmPrinter::emitEndOfAsmFile(Module &M) {
   // Emit stack and fault map information.
   FM.serializeToFaultMapSection();
 
+  // If import call optimization is enabled, emit the appropriate section.
+  // We do this whether or not we recorded any import calls.
+  if (EnableImportCallOptimization && TT.isOSBinFormatCOFF()) {
+    OutStreamer->switchSection(getObjFileLowering().getImportCallSection());
+
+    // Section always starts with some magic.
+    constexpr char ImpCallMagic[12] = "Imp_Call_V1";
+    OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
+  }
 }
 
 void AArch64AsmPrinter::emitLOHs() {
@@ -2693,6 +2712,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   case AArch64::TCRETURNrinotx16:
   case AArch64::TCRETURNriALL: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCInst TmpInst;
     TmpInst.setOpcode(AArch64::BR);
@@ -2702,6 +2722,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   }
   case AArch64::TCRETURNdi: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCOperand Dest;
     MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
@@ -3035,6 +3056,14 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     TS->emitARM64WinCFISaveAnyRegQPX(MI->getOperand(0).getImm(),
                                      -MI->getOperand(2).getImm());
     return;
+
+  case AArch64::BLR:
+  case AArch64::BR:
+    recordIfImportCall(MI);
+    MCInst TmpInst;
+    MCInstLowering.Lower(MI, TmpInst);
+    EmitToStreamer(*OutStreamer, TmpInst);
+    return;
   }
 
   // Finally, do the automated lowerings for everything else.
@@ -3043,6 +3072,19 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, TmpInst);
 }
 
+void AArch64AsmPrinter::recordIfImportCall(
+    const llvm::MachineInstr *BranchInst) {
+  if (!EnableImportCallOptimization ||
+      !TM.getTargetTriple().isOSBinFormatCOFF())
+    return;
+
+  auto [GV, OpFlags] = BranchInst->getMF()->tryGetCalledGlobal(BranchInst);
+  if (GV && GV->hasDLLImportStorageClass()) {
+    OutStreamer->emitCOFFImpCall(
+        MCInstLowering.GetGlobalValueSymbol(GV, OpFlags));
+  }
+}
+
 void AArch64AsmPrinter::emitMachOIFuncStubBody(Module &M, const GlobalIFunc &GI,
                                                MCSymbol *LazyPointer) {
   // _ifunc:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 070163a5fb29...
[truncated]

@@ -577,6 +580,24 @@ bool COFFAsmParser::parseDirectiveSymIdx(StringRef, SMLoc) {
return false;
}

bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
if (!getContext().getObjectFileInfo()->getImportCallSection())
return TokError("target doesn't have an import call section");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid ' in error messages

// CHECK-NEXT: 0x[[#%x,TCOFFSET - 8]] IMAGE_REL_ARM64_PAGEBASE_REL21 __imp_b ([[#%u,TCSYM]])
// CHECK-NEXT: 0x[[#%x,TCOFFSET - 4]] IMAGE_REL_ARM64_PAGEOFFSET_12L __imp_b ([[#%u,TCSYM]])
// CHECK-NEXT: }
// CHECK-NEXT: ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End of file whitespace error

Comment on lines +588 to +605
if (getParser().parseIdentifier(SymbolID))
return TokError("expected identifier in directive");

if (getLexer().isNot(AsmToken::EndOfStatement))
return TokError("unexpected token in directive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think these error cases are tested

.impcall __imp_b
br x8

// CHECK: error: .impcall directives were used, but no existing .impcall section exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include a source location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no: the .impcall section may appear after the .impcall directives, so I'd need to keep track of the locations of all (or at least one) observed .impcall directives, which is not easy to do with the way asm parsing vs object writing is setup.

@mstorsjo mstorsjo requested review from mstorsjo and cjacek January 3, 2025 10:23
@mstorsjo
Copy link
Member

mstorsjo commented Jan 3, 2025

A couple questions on the mechanism:

  • As the flag /d2ImportCallOptimization is undocumented, I presume that this format is undocumnted too. This covers two separate formats as far as I can see - the .impcall section contents, which this PR generates, and which the linker consumes. This format is mostly a convention - an agreement between compiler and linker; this currently uses the Imp_Call_V1 identifier. And then secondly, the dynamic relocations that the linker generates, which ends up in the final binary, which is consumed by the windows loader. This format isn't touched upon here (as this only covers code generation, not linking). The format of those dynamic relocations is much more fixed, as it's handled by the OS. Although this is only an optimization, so running on older Windows versions which doesn't recognize it, should be fine too?
  • When the linker uses these dynamic relocations, it changes a br or blr indirect branch instruction into a direct b or bl, if the target address is close enough within the address space - right? (The other instructions for loading the address are left untouched, as those instructions can be anywhere detached from the branch, and we can't speculate on whether the register contents is needed elsewhere.)
  • As the 64 bit address space is kinda large, and the b and bl instructions only have a +/- 128 MB range, I guess that this optimization simply can't be applied, if the target is too far away? Intuitively, it feels like this wouldn't end up effective all that often? Then again, I guess most DLLs are loaded close to each other in the virtual memory layout, and the base EXE, with dynamic base enabled (as always on aarch64) also would end up somewhere close.
  • Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?
  • For cases when dll imported functions are called without being marked as dllimport in headers, we jump via a thunk (from the import library, and/or linker generated). Can the same thing be applied to them? Does that require including similar .impcall sections in the import libraries for the thunks? (In the case of lld, it actually doesn't use the import library contents here but just synthesize it on its own - there we could do the same right away without needing to update import libraries with this metadata.)

@dpaoliello
Copy link
Contributor Author

As the flag /d2ImportCallOptimization is undocumented, I presume that this format is undocumnted too. This covers two separate formats as far as I can see - the .impcall section contents, which this PR generates, and which the linker consumes. This format is mostly a convention - an agreement between compiler and linker; this currently uses the Imp_Call_V1 identifier. And then secondly, the dynamic relocations that the linker generates, which ends up in the final binary, which is consumed by the windows loader. This format isn't touched upon here (as this only covers code generation, not linking). The format of those dynamic relocations is much more fixed, as it's handled by the OS. Although this is only an optimization, so running on older Windows versions which doesn't recognize it, should be fine too?

Correct on all accounts:

@dpaoliello
Copy link
Contributor Author

When the linker uses these dynamic relocations, it changes a br or blr indirect branch instruction into a direct b or bl, if the target address is close enough within the address space - right? (The other instructions for loading the address are left untouched, as those instructions can be anywhere detached from the branch, and we can't speculate on whether the register contents is needed elsewhere.)
As the 64 bit address space is kinda large, and the b and bl instructions only have a +/- 128 MB range, I guess that this optimization simply can't be applied, if the target is too far away? Intuitively, it feels like this wouldn't end up effective all that often? Then again, I guess most DLLs are loaded close to each other in the virtual memory layout, and the base EXE, with dynamic base enabled (as always on aarch64) also would end up somewhere close.

The linker doesn't do the rewriting, the kernel-mode loader does, but yes it will rewrite the indirect branches to direct branches. Since it is the loader doing this work, it knows how far away the target address is and, therefore, if the instruction can be rewritten or not. Since this is an optimization, the loader can choose not to rewrite the instruction without affecting correctness.

@dpaoliello
Copy link
Contributor Author

Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?

I don't have the exact numbers, but we did see significant performance improvement within the Windows kernel for some components (which were especially chatty with other .sys files) when MSVC added this optimization. I'll see if I can get one of my colleagues to provide more details.

@dpaoliello
Copy link
Contributor Author

For cases when dll imported functions are called without being marked as dllimport in headers, we jump via a thunk (from the import library, and/or linker generated). Can the same thing be applied to them? Does that require including similar .impcall sections in the import libraries for the thunks? (In the case of lld, it actually doesn't use the import library contents here but just synthesize it on its own - there we could do the same right away without needing to update import libraries with this metadata.)

I'm not 100% sure how MSVC handles this - there are no changes to import libraries (i.e., there's no .impcall section), so I'm guessing it would be done within the linker.

If lld supports generating the Dynamic Value Relocation Table, there's no reason it can't add additional entries in there for other indirect branches to imported functions that it knows about.

@dpaoliello
Copy link
Contributor Author

Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?

I don't have the exact numbers, but we did see significant performance improvement within the Windows kernel for some components (which were especially chatty with other .sys files) when MSVC added this optimization. I'll see if I can get one of my colleagues to provide more details.

I have some numbers: when applied to the Arm64 Windows Kernel, we saw a 3% throughput improvement on DiskSpd benchmarks for both SnapDragon 8cx Gen2 and Gen3 devices.

@mstorsjo
Copy link
Member

mstorsjo commented Jan 6, 2025

I have some numbers: when applied to the Arm64 Windows Kernel, we saw a 3% throughput improvement on DiskSpd benchmarks for both SnapDragon 8cx Gen2 and Gen3 devices.

Interesting - that's a massive number a change like this. I'm not too familiar with the structure of the kernel though, I presume that there would need to be a huge number of cross-DLL function calls within the kernel for this change to make any measurable difference.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism for tracking the call from ISel through the assembler seems generally fine. Alternatively, you could add an operand to call instructions, but I don't see any reason to prefer that.

void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
const MCSymbol *Symbol) {
auto &SectionData = SectionToImportCallsMap[FB.getParent()];
SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unsigned(FB.getContents().size()) actually computes the value you want: a section contains multiple fragments, in general. And some of those fragments have a size that can't actually be computed until we do relaxation. (Relaxation is less common on aarch64 than on x86, but it shows up in a few places... in particular, .p2align .)

Probably need to generate a temporary symbol, then compute the offset between the beginning of the section and the symbol when you generate the section.

(Less importantly, the cast implicitly truncates from 64-bit to 32-bit. Unlikely to come up in valid code, but should print a reasonable error.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - I switched this to using symbols and Asm.getSymbolOffset


// Section always starts with some magic.
constexpr char ImpCallMagic[12] = "Imp_Call_V1";
OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to make the assembler generate the magic number automatically, just so it's harder to mess up for people hand-writing asm.

@efriedma-quic
Copy link
Collaborator

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

You could possibly add a directive specifically to emit section numbers... that would allow you to move more of the logic into the AsmPrinter. We already have .secidx; that's not quite what you want here, but maybe to give you some idea what it would look like.

Not sure if that's worth doing.

@dpaoliello
Copy link
Contributor Author

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

You could possibly add a directive specifically to emit section numbers... that would allow you to move more of the logic into the AsmPrinter. We already have .secidx; that's not quite what you want here, but maybe to give you some idea what it would look like.

I considered this for an earlier design, but it ended up being very messy: the section numbers are assigned after the sections have been streamed, so if there is a general purpose "insert section number here" mechanism, it needs to be able to go back and rewrite already-streamed sections. The current design avoids that complexity since we only use the section numbers in the .impcall sections which we pre-allocate but then fill in afterwards.

@efriedma-quic
Copy link
Collaborator

it needs to be able to go back and rewrite already-streamed sections

Yes, a relocation (at the assembler level, not in the resulting object). We have infrastructure to do this if you wanted to.

@dpaoliello
Copy link
Contributor Author

Yes, a relocation (at the assembler level, not in the resulting object). We have infrastructure to do this if you wanted to.

Sure, can you point me to the code?

I'd be happy to get rid of the .impcall directive and do most of the work in the asm printer, as it'll make the x86 implementation easier.

@efriedma-quic
Copy link
Collaborator

Internally it should be an MCFixup. I guess you'd add a new fixup kind.

Maybe follow the code for .secidx to get some idea what that would look like. It's not quite the same, because you need to resolve it instead of emitting a relocation, but before that point it's similar.

@dpaoliello
Copy link
Contributor Author

Ok, switched this over to using section number and offset assembly directives, implemented using new MCTargetExpr derived types (which avoided the need to introduce any new fixup kinds and localizes the changes to just the COFF streamer and object writer).

This also meant that writing the .impcall section is now completely contained in the asm printer, avoiding the need for .impcall directives (and the weird failure cases associated with them) and will make the x86 implementation much easier.

/// Mapping of call instruction to the global value and target flags that it
/// calls, if applicable.
DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
CalledGlobalsMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, if we're adding state to MachineFunction, we should add MIR serialization/deserialization (MIRPrinter::print etc.). Hopefully not too hard to implement here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Are there explicit tests for this?
(I ran into some crashes while testing that I fixed, so I know that round-tripping is working...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/CodeGen/MIR is for print/parse tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks! I've added a test.

@dpaoliello dpaoliello force-pushed the impcall branch 3 times, most recently from de6d3da to 2e2c02c Compare January 10, 2025 23:22
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/11837

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Input file: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.dSYM
Output file (aarch64): /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.default.gSYM
Loaded 3 functions from DWARF.
Loaded 3 functions from symbol table.
warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000047
addr=0x0000000000000248, file=  3, line=  5
addr=0x0000000000000254, file=  3, line=  7
addr=0x0000000000000258, file=  3, line=  9
addr=0x000000000000025c, file=  3, line=  8
addr=0x0000000000000260, file=  3, line= 11
addr=0x0000000000000264, file=  3, line= 10
addr=0x0000000000000268, file=  3, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000030
addr=0x0000000000000248, file=  2, line=  5
addr=0x0000000000000254, file=  2, line=  7
addr=0x0000000000000258, file=  2, line=  9
addr=0x000000000000025c, file=  2, line=  8
addr=0x0000000000000260, file=  2, line= 11
addr=0x0000000000000264, file=  2, line= 10
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-bootstrap-asan running on sanitizer-buildbot7 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/24/builds/4106

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 85836 tests, 72 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (26331 of 85836)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
=================================================================
==240333==ERROR: AddressSanitizer: use-after-poison on address 0xeb50b1e4dc30 at pc 0xbc2deff3f9d0 bp 0xffffe9650950 sp 0xffffe9650948
READ of size 8 at 0xeb50b1e4dc30 thread T0
    #0 0xbc2deff3f9cc in getParent /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0xbc2deff3f9cc in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0xbc2deff35b74 in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0xbc2deff48174 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0xbc2deff81348 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0xbc2defbfaa54 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0xbc2df083d8dc in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0xbc2df0850cec in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0xbc2df083ebd0 in runOnModule /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0xbc2df083ebd0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0xbc2deb886c4c in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0xbc2deb8827ec in main /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0xed40b2f684c0  (/lib/aarch64-linux-gnu/libc.so.6+0x284c0) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #13 0xed40b2f68594 in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x28594) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #14 0xbc2deb7948ac in _start (/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x88448ac)

0xeb50b1e4dc30 is located 2864 bytes inside of 4096-byte region [0xeb50b1e4d100,0xeb50b1e4e100)
allocated by thread T0 here:
    #0 0xbc2deb875aec in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0xbc2debcb592c in Allocate /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0xbc2debcb592c in StartNewSlab /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0xbc2debcb592c in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0xbc2defc128ac in allocateOperandArray /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0xbc2defc128ac in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 85836 tests, 72 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (26331 of 85836)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
+ /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
=================================================================
==240333==ERROR: AddressSanitizer: use-after-poison on address 0xeb50b1e4dc30 at pc 0xbc2deff3f9d0 bp 0xffffe9650950 sp 0xffffe9650948
READ of size 8 at 0xeb50b1e4dc30 thread T0
    #0 0xbc2deff3f9cc in getParent /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0xbc2deff3f9cc in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0xbc2deff35b74 in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0xbc2deff48174 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0xbc2deff81348 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0xbc2defbfaa54 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0xbc2df083d8dc in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0xbc2df0850cec in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0xbc2df083ebd0 in runOnModule /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0xbc2df083ebd0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0xbc2deb886c4c in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0xbc2deb8827ec in main /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0xed40b2f684c0  (/lib/aarch64-linux-gnu/libc.so.6+0x284c0) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #13 0xed40b2f68594 in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x28594) (BuildId: 32fa4d6f3a8d5f430bdb7af2eb779470cd5ec7c2)
    #14 0xbc2deb7948ac in _start (/home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x88448ac)

0xeb50b1e4dc30 is located 2864 bytes inside of 4096-byte region [0xeb50b1e4d100,0xeb50b1e4e100)
allocated by thread T0 here:
    #0 0xbc2deb875aec in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0xbc2debcb592c in Allocate /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0xbc2debcb592c in StartNewSlab /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0xbc2debcb592c in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0xbc2defc128ac in allocateOperandArray /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0xbc2defc128ac in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-aarch64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/7294

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28487 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1224403==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x58c3d119d462 bp 0x7ffee661bc70 sp 0x7ffee661bc68
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x58c3d119d461 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x58c3d119d461 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x58c3d1191cfb in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x58c3d11a7c12 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x58c3d11f275e in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x58c3d0db97b1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x58c3d1c78a03 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x58c3d1c8fabe in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x58c3d1c7a5fa in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x58c3d1c7a5fa in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x58c3cbe6bfdc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x58c3cbe6652f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7c418222a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7c418222a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x58c3cbd79da4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4beda4)

0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x58c3cbe59a82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x58c3d368f6bd in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x58c3cc2f4b59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x58c3cc2f4b59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x58c3cc2f4b59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x58c3d0dd4728 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28487 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1224403==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x58c3d119d462 bp 0x7ffee661bc70 sp 0x7ffee661bc68
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x58c3d119d461 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x58c3d119d461 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x58c3d1191cfb in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x58c3d11a7c12 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x58c3d11f275e in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x58c3d0db97b1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x58c3d1c78a03 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x58c3d1c8fabe in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x58c3d1c7a5fa in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x58c3d1c7a5fa in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x58c3cbe6bfdc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x58c3cbe6652f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7c418222a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7c418222a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x58c3cbd79da4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4beda4)

0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x58c3cbe59a82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x58c3d368f6bd in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x58c3cc2f4b59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x58c3cc2f4b59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x58c3cc2f4b59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x58c3d0dd4728 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot1 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/5179

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28051 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==273570==ERROR: AddressSanitizer: use-after-poison on address 0x78bfbba4dc30 at pc 0x62c4365ab02b bp 0x7ffc42c302b0 sp 0x7ffc42c302a8
READ of size 8 at 0x78bfbba4dc30 thread T0
    #0 0x62c4365ab02a in getParent /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x62c4365ab02a in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x62c43659f0ac in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x62c4365b4c6a in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x62c4365f85f0 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x62c4361cc1c7 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x62c437030bfd in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x62c4370477d1 in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x62c43703222a in runOnModule /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x62c43703222a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x62c4311ba217 in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x62c4311b546c in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7aafbc82a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7aafbc82a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x62c4310be924 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x8823924)

0x78bfbba4dc30 is located 2864 bytes inside of 4096-byte region [0x78bfbba4d100,0x78bfbba4e100)
allocated by thread T0 here:
    #0 0x62c4311a4bb2 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x62c43169acae in Allocate /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0x62c43169acae in StartNewSlab /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0x62c43169acae in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0x62c4361e72af in allocateOperandArray /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0x62c4361e72af in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28051 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==273570==ERROR: AddressSanitizer: use-after-poison on address 0x78bfbba4dc30 at pc 0x62c4365ab02b bp 0x7ffc42c302b0 sp 0x7ffc42c302a8
READ of size 8 at 0x78bfbba4dc30 thread T0
    #0 0x62c4365ab02a in getParent /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x62c4365ab02a in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x62c43659f0ac in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x62c4365b4c6a in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x62c4365f85f0 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x62c4361cc1c7 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x62c437030bfd in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x62c4370477d1 in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x62c43703222a in runOnModule /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x62c43703222a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x62c4311ba217 in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x62c4311b546c in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7aafbc82a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7aafbc82a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x62c4310be924 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x8823924)

0x78bfbba4dc30 is located 2864 bytes inside of 4096-byte region [0x78bfbba4d100,0x78bfbba4e100)
allocated by thread T0 here:
    #0 0x62c4311a4bb2 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x62c43169acae in Allocate /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0x62c43169acae in StartNewSlab /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0x62c43169acae in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0x62c4361e72af in allocateOperandArray /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0x62c4361e72af in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla-2stage running on linaro-g3-03 while building llvm at step 11 "build stage 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/41/builds/4469

Here is the relevant piece of the build log for the reference
Step 11 (build stage 2) failure: 'ninja' (failure)
...
[7996/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/TextDiagnostic.cpp.o
[7997/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/compute-offsets.cpp.o
[7998/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/mod-file.cpp.o
[7999/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-omp-structure.cpp.o
[8000/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/openmp-modifiers.cpp.o
[8001/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/BoxValue.cpp.o
[8002/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Complex.cpp.o
[8003/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/data-to-inits.cpp.o
[8004/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Character.cpp.o
[8005/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/OpenMP/Utils.cpp
Killed
[8006/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/expression.cpp.o
[8007/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/DoLoopHelper.cpp.o
[8008/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/ConvertCall.cpp
Killed
[8009/8801] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/AliasAnalysis.cpp.o
[8010/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[8011/8801] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/TBAAForest.cpp.o
[8012/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/ConvertVariable.cpp
Killed
[8013/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[8014/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/HLFIRTools.cpp.o
[8015/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/CallInterface.cpp
Killed
[8016/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/program-tree.cpp.o
[8017/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/FrontendAction.cpp.o
[8018/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/CompilerInstance.cpp.o
[8019/8801] Building CXX object tools/flang/lib/FrontendTool/CMakeFiles/flangFrontendTool.dir/ExecuteCompilerInvocation.cpp.o
[8020/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/pointer-assignment.cpp.o
[8021/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/CompilerInvocation.cpp.o
[8022/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/tools.cpp.o
[8023/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/resolve-labels.cpp.o
[8024/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/unparse-with-symbols.cpp.o
[8025/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/rewrite-directives.cpp.o
[8026/8801] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/fold-integer.cpp.o
[8027/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/rewrite-parse-tree.cpp.o
[8028/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/runtime-type-info.cpp.o
[8029/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/type.cpp.o
[8030/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/scope.cpp.o
[8031/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/symbol.cpp.o
[8032/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/resolve-names-utils.cpp.o

@fhahn
Copy link
Contributor

fhahn commented Jan 12, 2025

It looks like the change introduced a use-after-poison https://lab.llvm.org/buildbot/#/builders/169/builds/7294/steps/10/logs/stdio

FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (1 of 88242)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2
Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1112042==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x5979d93c2be3 bp 0x7ffd78825d90 sp 0x7ffd78825d88
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x5979d93c2be2 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x5979d93c2be2 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x5979d93b74ab in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x5979d93cd392 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x5979d9417ede in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x5979d8fdef61 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x5979d9e9e183 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x5979d9eb523e in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x5979d9e9fd7a in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x5979d9e9fd7a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x5979d408e2dc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x5979d408882f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7cb6a4a2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7cb6a4a2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x5979d3f9c0a4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4bf0a4)
0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x5979d407bd82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x5979db8b4e3d in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x5979d4516e59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x5979d4516e59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x5979d4516e59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x5979d8ff9ed8 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12
    #6 0x5979d8ff9ed8 in allocate<llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096UL, 4096UL, 128UL> > /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/ArrayRecycler.h:130:38
    #7 0x5979d8ff9ed8 in allocateOperandArray /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #8 0x5979d8ff9ed8 in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19
    #9 0x5979d8fb74cc in llvm::MachineFunction::CreateMachineInstr(llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:433:7
    #10 0x5979d45d7a8c in llvm::BuildMI(llvm::MachineFunction&, llvm::MIMetadata const&, llvm::MCInstrDesc const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstrBuilder.h:375:37
    #11 0x5979db2175cd in llvm::InstrEmitter::EmitMachineNode(llvm::SDNode*, bool, bool, llvm::SmallDenseMap<llvm::SDValue, llvm::Register, 16u, llvm::DenseMapInfo<llvm::SDValue, void>, llvm::detail::DenseMapPair<llvm::SDValue, llvm::Register>>&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1060:29
    #12 0x5979db25acab in EmitNode /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h:145:7
    #13 0x5979db25acab in llvm::ScheduleDAGSDNodes::EmitSchedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&)::$_0::operator()(llvm::SDNode*, bool, bool, llvm::SmallDenseMap<llvm::SDValue, llvm::Register, 16u, llvm::DenseMapInfo<llvm::SDValue, void>, llvm::detail::DenseMapPair<llvm::SDValue, llvm::Register>>&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:874:13
    #14 0x5979db2587c3 in llvm::ScheduleDAGSDNodes::EmitSchedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:965:9
    #15 0x5979db494dbe in llvm::SelectionDAGISel::CodeGenAndEmitDAG() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1151:42
    #16 0x5979db48d112 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1898:7
    #17 0x5979db4848d1 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:615:3
    #18 0x5979db47dd38 in llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:375:20
    #19 0x5979d8fdef61 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #20 0x5979d9e9e183 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #21 0x5979d9eb523e in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #22 0x5979d9e9fd7a in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #23 0x5979d9e9fd7a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #24 0x5979d408e2dc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #25 0x5979d408882f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #26 0x7cb6a4a2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #27 0x7cb6a4a2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #28 0x5979d3f9c0a4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4bf0a4)
SUMMARY: AddressSanitizer: use-after-poison /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55 in getParent
Shadow bytes around the buggy address:
  0x52100002d980: 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x52100002da00: f7 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7
  0x52100002da80: f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7 f7 f7 f7
  0x52100002db00: f7 f7 f7 f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7
  0x52100002db80: 00 00 00 00 00 00 00 00 00 f7 00 00 00 00 00 00
=>0x52100002dc00: 00 00 f7 f7 f7 f7[f7]f7 f7 f7 f7 f7 f7 00 00 00
  0x52100002dc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00 00
  0x52100002dd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x52100002dd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00
  0x52100002de00: 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7
  0x52100002de80: f7 f7 f7 00 00 00 00 00 00 00 00 f7 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1112042==ABORTING

Please take a look and revert the change if it takes longer to fix.

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

I found the root cause: even though I modeled "Called Globals" after "Call Site Info", I didn't all the move/copy/erase functions that keep the Calle Site Info in-sync with changes to the instructions. It should be an easy fix since I can rename and update the existing functions.

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

I found the root cause: even though I modeled "Called Globals" after "Call Site Info", I didn't all the move/copy/erase functions that keep the Calle Site Info in-sync with changes to the instructions. It should be an easy fix since I can rename and update the existing functions.

PR with fix: #122762

kstoimenov added a commit that referenced this pull request Jan 13, 2025
…valent to MSVC /d2ImportCallOptimization) (#121516)"

Breaks sanitizer build: https://lab.llvm.org/buildbot/#/builders/52/builds/5179

This reverts commits:
5ee0a71
d997a72
@kstoimenov
Copy link
Contributor

@dpaoliello please note this was reverted in 2f7ade4.

dpaoliello added a commit to dpaoliello/llvm-project that referenced this pull request Jan 13, 2025
…ivalent to MSVC /d2ImportCallOptimization) (llvm#121516)"

This reverts commit 2f7ade4.
kazutakahirata pushed a commit to kazutakahirata/llvm-project that referenced this pull request Jan 13, 2025
dpaoliello added a commit that referenced this pull request Jan 13, 2025
…ivalent to MSVC /d2ImportCallOptimization) (#121516)" (#122777)

This reverts commit 2f7ade4.

Fix is available in #122762
dpaoliello added a commit that referenced this pull request Jan 13, 2025
#122762)

Fixes the "use after poison" issue introduced by #121516 (see
<#121516 (comment)>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 13, 2025
…ll Site info (#122762)

Fixes the "use after poison" issue introduced by #121516 (see
<llvm/llvm-project#121516 (comment)>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
dpaoliello added a commit that referenced this pull request Jan 30, 2025
… import call optimization, and remove LLVM flag (#122831)

Switches import call optimization from being enabled by an LLVM flag to
instead using a module attribute, and creates a new Clang flag that will
set that attribute. This addresses the concern raised in the original
PR:
<#121516 (comment)>

This change also only creates the Called Global info if the module
attribute is present, addressing this concern:
<#122762 (review)>
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 30, 2025
…tribute for import call optimization, and remove LLVM flag (#122831)

Switches import call optimization from being enabled by an LLVM flag to
instead using a module attribute, and creates a new Clang flag that will
set that attribute. This addresses the concern raised in the original
PR:
<llvm/llvm-project#121516 (comment)>

This change also only creates the Called Global info if the module
attribute is present, addressing this concern:
<llvm/llvm-project#122762 (review)>
dpaoliello added a commit that referenced this pull request May 20, 2025
…ivalent to MSVC /d2guardretpoline) (#126631)

This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
kostasalv pushed a commit to kostasalv/llvm-project that referenced this pull request May 21, 2025
…ivalent to MSVC /d2guardretpoline) (llvm#126631)

This is the x64 equivalent of llvm#121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
jtstogel added a commit to jtstogel/llvm-project that referenced this pull request May 22, 2025
Add queue_test

[MC][DebugInfo] Emit linetable entries with known offsets immediately (#134677)

DWARF linetable entries are usually emitted as a sequence of
MCDwarfLineAddrFragment fragments containing the line-number difference
and an MCExpr describing the instruction-range the linetable entry
covers. These then get relaxed during assembly emission.

However, a large number of these instruction-range expressions are
ranges within a fixed MCDataFragment, i.e. a range over fixed-size
instructions that are not subject to relaxation at a later stage. Thus,
we can compute the address-delta immediately, and not spend time and
memory describing that computation so it can be deferred.

AMDGPU: Add regression test for multiple frame index lowering (#140784)

Failures appeared after https://github.com/llvm/llvm-project/pull/140587 but this case wasn't covered

[lldb][core] Fix getting summary of a variable pointing to r/o memory (#139196)

Motivation example:

```
> lldb -c altmain2.core
...
(lldb) var F
(const char *) F = 0x0804a000 ""
```

The variable `F` points to a read-only memory page not dumped to the
core file, so `Process::ReadMemory()` cannot read the data. The patch
switches to `Target::ReadMemory()`, which can read data both from the
process memory and the application binary.

Suppress errors from well-formed-testing type traits in SFINAE contexts (#135390)

There are several type traits that produce a boolean value or type based
on the well-formedness of some expression (more precisely, the immediate
context, i.e. for example excluding nested template instantiation):
* `__is_constructible` and variants,
* `__is_convertible` and variants,
* `__is_assignable` and variants,
* `__reference_{binds_to,{constructs,converts}_from}_temporary`,
* `__is_trivially_equality_comparable`,
* `__builtin_common_type`.

(It should be noted that the standard doesn't always base this on the
immediate context being well-formed: for `std::common_type` it's based
on whether some expression "denotes a valid type." But I assume that's
an editorial issue and means the same thing.)

Errors in the immediate context are suppressed, instead the type traits
return another value or produce a different type if the expression is
not well-formed. This is achieved using an `SFINAETrap` with
`AccessCheckingSFINAE` set to true. If the type trait is used outside of
an SFINAE context, errors are discarded because in that case the
`SFINAETrap` sets `InNonInstantiationSFINAEContext`, which makes
`isSFINAEContext` return an `optional(nullptr)`, which causes the errors
to be discarded in `EmitDiagnostic`. However, in an SFINAE context this
doesn't happen, and errors are added to `SuppressedDiagnostics` in the
`TemplateDeductionInfo` returned by `isSFINAEContext`. Once we're done
with deducing template arguments and have decided which template is
going to be instantiated, the errors corresponding to the chosen
template are then emitted. At this point we get errors from those type
traits that we wouldn't have seen if used with the same arguments
outside of an SFINAE context. That doesn't seem right.

So what we want to do is always set `InNonInstantiationSFINAEContext`
when evaluating these well-formed-testing type traits, regardless of
whether we're in an SFINAE context or not. This should only affect the
immediate context, as nested contexts add a new `CodeSynthesisContext`
that resets `InNonInstantiationSFINAEContext` for the time it's active.

Going through uses of `SFINAETrap` with `AccessCheckingSFINAE` = `true`,
it occurred to me that all of them want this behavior and we can just
use this parameter to decide whether to use a non-instantiation context.
The uses are precisely the type traits mentioned above plus the
`TentativeAnalysisScope`, where I think it is also fine. (Though I think
we don't do tentative analysis in SFINAE contexts anyway.)

Because the parameter no longer just sets `AccessCheckingSFINAE` in Sema
but also `InNonInstantiationSFINAEContext`, I think it should be renamed
(along with uses, which also point the reviewer to the affected places).
Since we're testing for validity of some expression, `ForValidityCheck`
seems to be a good name.

The added tests should more or less correspond to the users of
`SFINAETrap` with `AccessCheckingSFINAE` = `true`. I added a test for
errors outside of the immediate context for only one type trait, because
it requires some setup and is relatively noisy.

We put the `ForValidityCheck` condition first because it's constant in
all uses and this would then allow the compiler to prune the call to
`isSFINAEContext` when true.

Fixes #132044.

[gn build] Manually port 8f03e1a

Emit inbounds and nuw attributes in memref. (#138984)

Now that MLIR accepts nuw and nusw in getelementptr, this patch emits
the inbounds and nuw attributes when lower memref to LLVM in load and
store operators.

This patch also strengthens the memref.load and memref.store spec about
undefined behaviour during lowering.

This patch also lifts the |rewriter| parameter in getStridedElementPtr
ahead so that LLVM::GEPNoWrapFlags can be added at the end with a
default value and grouped together with other operators' parameters.

Signed-off-by: Lin, Peiyong <[email protected]>

[llvm] Use llvm::is_contained (NFC) (#140742)

[bugpoint] Use a range-based for loop (NFC) (#140743)

[llvm] prepare explicit template instantiations in llvm/CodeGen for DLL export annotations (#140653)

This patch prepares the llvm/CodeGen library for public interface
annotations in support of an LLVM Windows DLL (shared library) build,
tracked in #109483. The purpose of this patch is to make the upcoming
codemod of this library more straight-forward. It is not expected to
impact any functionality.

The `LLVM_ABI` annotations will be added in a subsequent patch. These
changes are required to build with visibility annotations using Clang
and gcc on Linux/Darwin/etc; Windows DLL can build fine without them.

This PR does four things in preparation for adding `LLVM_ABI`
annotations to llvm/CodeGen:
1. Explicitly include `Machine.h` and `Function.h` headers from
`MachinePassManager.cpp` so that `Function` and `Machine` types are
available for the instantiations of `InnerAnalysisManagerProxy`. Without
this change, Clang only will only export one of the templates after
visibility annotations are added to them. Unclear if this is a Clang bug
or expected behavior, but this change avoids the issue and should be
harmless.
2. Refactor the definition of `MachineFunctionAnalysisManager` to its
own header file. Without this change, it is not possible to add
visibility annotations to the declaration with causing gcc to produce
`-Wattribute` warnings.
3. Remove the redundant specialization of the
`DominatorTreeBase<MachineBasicBlock, false>::addRoot` method. The
specialization is the same as implemented in `DominatorTreeBase` so
should be unnecessary. Without this change, it is not possible to
annotate the subsequent instantiations of `DominatorTreeBase` in the
header file without gcc producing `-Wattribute` warnings. Mark
unspecialized `addRoot` as `inline` to match the removed specialized
version.
4. Move the explicit instantiations of the `GenericDomTreeUpdater`
template earlier in the header file. These need to appear before being
used in the `MachineDomTreeUpdater` class definition or gcc will produce
warnings once visibility annotations are added.

The LLVM Windows DLL effort is tracked in #109483. Additional context is
provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).

Clang and gcc handle visibility attributes on explicit template
instantiations a bit differently; gcc is pickier and generates
`-Wattribute` warnings when an explicit instantiation with a visibility
annotation appears after the type has already appeared in the
translation unit. These warnings can be avoided by moving explicit
template instantiations so they always appear first.

Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:

- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang

[llvm-exegesis] Error instead of aborting on verification failure (#137581)

This patch makes llvm-exegesis emit an error when the machine function
fails in MachineVerification rather than aborting. This allows
downstream users (particularly https://github.com/google/gematria) to
handle these errors rather than having the entire process crash. This
essentially be NFC from the user perspective minus the addition of the
new error message.

[x64][win] Add compiler support for x64 import call optimization (equivalent to MSVC /d2guardretpoline) (#126631)

This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.

[NFC][CI] Reformat python files

Looks like some of these were not properly formatted at some point. This
patch reformats these files so that future diffs are cleaner when
running the formatter over the whole file.

[mlir][NFC] Simplify constant checks with isOneInteger and renamed isZeroInteger. (#139340)

The revision adds isOneInteger helper, and simplifies the existing code
with the two methods. It removes some lambda, which makes code cleaner.

For downstream users, you can update the code with the below script.

```bash
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.h
sed -i "s/isZeroIndex/isZeroInteger/g" **/*.cpp
```

---------

Signed-off-by: hanhanW <[email protected]>

[Attributor] Don't replace `addrspacecast (ptr null to ptr addrspace(x))` with `ptr addrspace(x) null` (#126779)

`ConstantPointerNull` represents a pointer with value 0, but it doesn’t
necessarily mean a `nullptr`. `ptr addrspace(x) null` is not the same as
`addrspacecast (ptr null to ptr addrspace(x))` if the `nullptr` in AS X
is not
zero. Therefore, we can't simply replace it.

Fixes #115083.

[CIR][NFC] Eliminate ArgInfo structure (#140612)

A previous refactoring had reduced the ArgInfo structure to contain a
single member, the argument type. This change eliminates the ArgInfo
structure entirely, instead just storing the argument type directly in
places where ArgInfo had previously been used.

This also updates the place where the arg types were previously being
copied for a call to CIRGenFunctionInfo::Profile to instead use the
stored argument types buffer directly and adds assertions where the
calculated folding set ID is used to verify that any match was correct.

[lldb][lldb-dap] show modules pane if supported by the adapter (#140603)

Fixes #140589
Added logic to dynamically set the `lldb-dap.showModules` context based
on the presence of modules in the debug session.

[mlir][Vector] Improve `vector.mask` verifier (#139823)

This PR improves the `vector.mask` verifier to make sure it's not
applying masking semantics to operations defined outside of the
`vector.mask` region. Documentation is updated to emphasize that and
make it clearer, even though it already stated that.

As part of this change, the logic that ensures that a terminator is
present in the region mask has been simplified to make it less
surprising to the user when a `vector.yield` is explicitly provided in
the IR.

[mlir] Check for int limits when converting gpu dims (#140747)

When the upper_bound of a gpu dim op (like `gpu.block_dim`) is the
maximum i32 integer value, the op conversion for it causes overflow by
adding 1 to convert the bound from closed to open. This fixes the bug by
clamping the open bound to the maximum i32 value.

---------

Signed-off-by: Max Dawkins <[email protected]>

[AMDGPU][LowerBufferFatPointers] Handle addrspacecast null to p7 (#140775)

Some application code operating on generic pointers (that then gete
initialized to buffer fat pointers) may perform tests against nullptr.
After address space inference, this results in comparisons against
`addrspacecast (ptr null to ptr addrspace(7))`, which were crashing.

However, while general casts to ptr addrspace(7) from generic pointers
aren't supposted, it is possible to cast null pointers to the all-zerose
bufer resource and 0 offset, which this patch adds.

It also adds a TODO for casting _out_ of buffer resources, which isn't
implemented here but could be.

[AMDGPU] Add make.buffer.rsrc to InferAddressSpaces (#140770)

make.buffer.rsrc can be subjected to address space inference. There's
not _currently_ a reason to have this, but we might as well handle this
in case it comes up.

---------

Co-authored-by: Matt Arsenault <[email protected]>

[gn] port d561d595c4ee (clang riscv_andes_vector.td)

[gn] fix mistake in f78a081cdb3

[gn build] Port 9260d310f1cb

[gn build] Port a9ee8e4a454e

[gn build] Port d561d595c4ee

[lld][WebAssembly] Set the target-cpu in LTO config (#140010)

I couldn't find an existing way to pass -mcpu=lime1 equivalent to LTO
codegen.
This commit would privide one. With this commit, you can do so by
passing
`-mllvm -mcpu=lime1` to wasm-ld.

[BOLT,test] Add --image-base to tests that use --section-start

When using -no-pie without a SECTIONS command, the linker uses the
target's default image base. If -Ttext= or --section-start specifies an
output section address below this base, the result is likely unintended.
LLD will give a diagnostic (#140187) and may change the behavior in the future.
It's good to set an explicit image base to avoid relying on its current
behavior. BOLT doesn't seem to care whether a PT_PHDR segment is
present.

Pull Request: https://github.com/llvm/llvm-project/pull/140570

[GISel] Fix ShuffleVector assert (#139769)

Fixes issue: https://github.com/llvm/llvm-project/issues/139752

When G_SHUFFLE_VECTOR has only 1 element then it is possible the vector
is decayed into a scalar.

[mlir] [liveness] Conservatively mark operands of return-like op inside non-callable and non-regionbranch op as live (#140793)

Currently the liveness analysis always marks operands yielded in regions
that aren't classified as `RegionBranchOpInterface` or
`CallableOpInterface` as non-live. Examples for these ops include
linalg.generic (with `linalg.yield` as terminator) or gpu ops (with
`gpu.yield` as terminator).

This in turn makes the `remove-dead-values` pass always incorrectly
remove the bodies of these ops, leading to invalid IR. Because these ops
define their own semantics, I have conservatively marked all operands of
these yield ops to be live.

[LoongArch] Remove wrong vector shuffle lowering for lasx. (#140688)

PR https://github.com/llvm/llvm-project/pull/137918 introduces a wrong
lowering for v4f64/v4i64 to generate xvshuf4i.d instruction.
This PR reverts the wrong part of lasx.

[lldb-dap] Avoid double 'new' events for dyld on Darwin (#140810)

I got a bug report where a pedantic DAP client complains about getting
two "new" module events for the same UUID. This is caused by the dyld
transition from the on-disk dyld to the shared cache dyld, which share
the same UUID. The transition is not generating an unloaded event
(because we're not really unloading dyld) but we do get a loaded event
(because the load address changed). This PR fixes the issue by relying
on the modules set as the source of truth instead of relying on the
event type.

[flang][cuda] Allocate extra descriptor in managed memory when it is coming from device (#140818)

[bazel][mlir] Add missing dep for 747620d (#140830)

fixes the following errors:

ERROR:
/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6a1efeb401da192d3572f00e2f11245b/external/llvm-project/mlir/BUILD.bazel:3410:11:
Compiling mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp
failed: (Exit 1): clang failed: error executing CppCompile command (from
target @@llvm-project//mlir:XeGPUTransforms) /usr/lib/llvm-18/bin/clang
-U_FORTIFY_SOURCE -fstack-protector -Wall -Wthread-safety -Wself-assign
-Wunused-but-set-parameter -Wno-free-nonheap-object -fcolor-diagnostics
-fno-omit-frame-pointer ... (remaining 130 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain
the sandbox build root for debugging

external/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp:11:10:
error: module llvm-project//mlir:XeGPUTransforms does not depend on a
module exporting 'mlir/Dialect/Arith/Utils/Utils.h'
   11 | #include "mlir/Dialect/Arith/Utils/Utils.h"
      |          ^

external/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUWgToSgDistribute.cpp:13:10:
fatal error: 'mlir/Dialect/Index/IR/IndexDialect.h' file not found
   13 | #include "mlir/Dialect/Index/IR/IndexDialect.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

[Clang] Fix an inadvertent overwrite of sub-initializers (#140714)

When using InitChecker with VerifyOnly, we create a new designated
initializer to handle anonymous fields. However in the last call to
CheckDesignatedInitializer, the subinitializer isn't properly preserved
but it gets overwritten by the cloned one. Which causes the initializer
to reference the dependent field, breaking assumptions when we
initialize the instantiated specialization.

Fixes https://github.com/llvm/llvm-project/issues/67173

[clang-format] Handle raw string literals containing JSON code (#140666)

Fix #65400

[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786)

The generic GPU barrier implementation checked if it was the main thread
in generic mode to identify single threaded regions. This doesn't work
since inside of a non-active (=sequential) parallel, that thread becomes
the main thread of a team, and is not the main thread in generic mode.
At least that is the implementation of the APIs today.

To identify single threaded regions we now check the team size
explicitly.

This exposed three other issues; one is, for now, expected and not a
bug, the second one is a bug and has a FIXME in the
single_threaded_for_barrier_hang_1.c file, and the final one is also
benign as described in the end.

The non-bug issue comes up if we ever initialize a thread state.
Afterwards we will never run any region in parallel. This is a little
conservative, but I guess thread states are really bad for performance
anyway.

The bug comes up if we optimize single_threaded_for_barrier_hang_1 and
execute it in Generic-SPMD mode. For some reason we loose all the
updates to b. This looks very much like a compiler bug, but could also
be another logic issue in the runtime. Needs to be investigated.

Issue number 3 comes up if we have nested parallels inside of a target
region. The clang SPMD-check logic gets confused, determines SPMD (which
is fine) but picks an unreasonable thread count. This is all benign, I
think, just weird:

```
  #pragma omp target teams
  #pragma omp parallel num_threads(64)
  #pragma omp parallel num_threads(10)
  {}
```
Was launched with 10 threads, not 64.

Revert "[AMDGPU] remove move instruction if there is no user of it (#136735)"

This reverts commit 883afa4ef93d824ec11981ccad04af1cd1e4ce29 since it is not
technically sound.

[MLIR][NVVM] Add NVVMRequiresSM op traits (#126886)

Motivation:
Currently, the NVVMOps are not verified against the supported SM
architectures. This can manifest as an ISel failure in the NVPTX LLVM
backend during CodeGen to PTX ISA. This PR addresses this issue by
adding verifier checks for Target-SM architectures in the NVVM Dialect
itself, thereby catching the errors early on.

Summary:
* Parametric traits named `NVVMRequiresSM` and `NVVMRequiresSMa` are
added to facilitate the version checks for typical and arch-accelerated
versions respectively.
* These traits can be attached to any NVVM Op to enable the checks for
the particular Op. (example shown below)
* An attribute interface called named `TargetAttrVerifyInterface` is
added to the GPU dialect which any target attribute seeking to perform
target-verification on the module can implement.
* The checks are performed by the `NVVMTargetAttr` (implementing the
`TargetAttrVerifyInterface` interface) when called from the GPU module
verifier where it walks through the module and performs the checks for
Ops with the `NVVMRequiresSM` traits.
* A few Ops in `NVVMOps.td` have been updated to serve as examples.

Example Usage:
```
       def NVVM_ReduxOp : NVVM_Op<"redux.sync"> {...}
 ----> def NVVM_ReduxOp : NVVM_Op<"redux.sync", [NVVMRequiresSM<80>]> {...}

       def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned"> {...}
 ----> def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {...}
```

---------

Co-authored-by: Guray Ozen <[email protected]>

[llvm-debuginfo-analyzer] Fix a couple of unhandled DWARF situations leading to a crash (#137221)

This pull request fixes a couple of unhandled situations in DWARF input
leading to a crash. Specifically,

- If the DWARF input contains a declaration of a C variadic function
(where `...` translates to `DW_TAG_unspecified_parameters`), which is
then followed by a definition, `llvm_unreachable()` is hit in
`LVScope::addMissingElements()`. This is only visible in Debug builds.

- Parsing of instructions in `LVBinaryReader::createInstructions()` does
not check whether `Offset` lies within the `Bytes` ArrayRef. A specially
crafted DWARF input can lead to this condition.

[llvm-mca] Drop const from a return type (NFC) (#140836)

[polly] Drop const from return types (NFC) (#140837)

[CodeGen] Avoid repeated hash lookups (NFC) (#140838)

[DebugInfo] Use std::map::try_emplace (NFC) (#140839)

This patch provides default member initialization for SymInfo, which
in turns allows us to call std::map::try_emplace without the value.

[CodeGen] Use range-based for loops (NFC) (#140840)

[lldb-dap] fix disassembly request instruction offset handling (#140486)

Fix the handling of the `instructionOffset` parameter, which resulted in
always returning the wrong disassembly because VSCode always uses
`instructionOffset = -50` and expects 50 instructions before the given
address, instead of 50 bytes before

[clang][bytecode] Optimize classify() further (#140735)

Try to do as few checks as possible. Check for builtin types only once,
then look at the BuiltinType Kind. For integers, we cache the int and
long size, since those are used a lot and the ASTContext::getIntWidth()
call is costly.

[clang][bytecode] Initialize global strings via memcpy (#140789)

If we know the char width is 1, we can just copy
the data over instead of going through the Pointer API.

add @skipIfWindows to unresolved disassemble test on windows (#140852)

Fix https://lab.llvm.org/buildbot/#/builders/141/builds/8867

[analyzer][NFC] Move PrettyStackTraceLocationContext into dispatchWorkItem (#140035)

[analyzer][NFC] Move PrettyStackTraceLocationContext into
dispatchWorkItem

This change helps with ensuring that the abstract machine call stack is
only dumped exactly once no matter what checker callback we have the
crash in.

Note that `check::EndAnalysis` callbacks are resolved outside of
`dispatchWorkItem`, but that's the only checker callback that is outside
of `dispatchWorkItem`.

CPP-6476

[LoongArch] Add patterns for vstelm instructions (#139201)

[MLIR][PDL] Skip over all results in the PDL Bytecode if a Constraint/Rewrite failed (#139255)

Skipping only over the first results leads to the curCodeIt pointing to
the wrong location in the bytecode, causing the execution to continue
with a wrong instruction after the Constraint/Rewrite.

Signed-off-by: Rickert, Jonas <[email protected]>

[Bazel] Port a9ee8e4a454ec01fefba8829d2847527aa80623f

[clang][NFC] Clean up ASTContext.cpp (#140847)

Use BuiltinType::{isInteger,isSignedInteger,isUnsignedInteger} instead
of doing the comparisons here.

[mlir][SPIRV] Do not rewrite CompositeInsert for coopmatrix (#137837)

When rewriting multiple CompositeInserts to CompositeConstruct, we need
to know the number of elements of the result type. However, we cannot
query the number of elements for cooperative matrix types.

[clang-tools-extra] Remove redundant control flow statements (NFC) (#140846)

[Bazel] Follow fixes for 9a553d3766aacb69e884823da92dedff264e3f0f

[Bazel] Also adapt test/BUILD for 9a553d3766aacb69e884823da92dedff264e3f0f

[llvm] Use *Map::try_emplace (NFC) (#140843)

try_emplace can default-construct values, so we do not need to do so
on our own.  Plus, try_emplace(Key) is much shorter than
insert(std::make_pair(Key, Value()).

[llvm] Fix typos in documentation (#140844)

[Clang] Fix a regression introduced by #140576 (#140859)

Lambda bodies should not be treated as subexpressions of the enclosing
scope.

[VectorCombine] Scalarize binop-like intrinsics (#138095)

Currently VectorCombine can scalarize vector compares and binary ops.
This extends it to also scalarize binary-op like intrinsics like umax,
minnum etc.

The motivation behind this is to scalarize more intrinsics in
VectorCombine rather than in DAGCombine, so we can sink splats across
basic blocks: see #137786

This currently has very little effect on generated code because
InstCombine doesn't yet canonicalize binary intrinsics where one operand
is a constant into the form that VectorCombine expects, i.e. `binop
(shuffle insert) const --> shuffle (binop insert const)`. The plan is to
land this first and then in a subsequent patch teach InstCombine to do
the canonicalization to avoid regressions in the meantime.

This uses `isTriviallyVectorizable` to determine whether or not an
intrinsic is safe to scalarize. There's also `isTriviallyScalarizable`,
but this seems more geared towards the Scalarizer pass and includes
intrinsics with multiple return values.

It also only handles intrinsics with two operands with the same type as
the return type. In the future we would generalize this to handle
arbitrary numbers of operands, including unary operators too, e.g. fneg
or fma, as well as different operand types, e.g. powi or scmp

[X86] combineINSERT_SUBVECTOR - generalise insert_subvector(x,extract(broadcast)) -> blend (#140516)

Don't match against specific broadcast nodes and let isShuffleEquivalent handle it

[clang-tidy][NFC] Refactor `modernize-pass-by-value` check code and tests (#140753)

- Deleted unused includes
- Deleted useless braces
- Modernized tests to use `CHECK-MESSAGES-NOT` and `CHECK-FIXES-NOT` for
better readability and maintainability

Add llvm-project archive issues for Chromium bug tracker (#132030)

The Chromium bug tracker is in an archived state. The Security Response
Group has preemptively created llvm-project GitHub issues with PDF
copies of the Chromium issues should the repository become inaccessible.

* Add URLs for redirects from
https://bugs.chromium.org/p/llvm/issues/detail?id=X to
https://issuetracker.google.com/issues/y
* Add URLs to llvm-project archive issues.
* Add an explanation of archive use.

[mlir] Silence an unused variable warnings in builds without asserts.

[libclc] Re-use shuffle_decl.inc in OpenCL shuffle2 declaration (#140679)

Also internalize __clc_get_el_* symbols in clc_shuffle2. llvm-diff shows
no change to amdgcn--amdhsa.bc.

[NVPTX] Support the OpenCL generic addrspace feature by default (#137940)

As best as I can see, all NVPTX architectures support the generic
address space.

I note there's a FIXME in the target's address space map about 'generic'
still having to be added to the target but we haven't observed any
issues with it downstream. The generic address space is mapped to the
same target address space as default/private (0), but this isn't
necessarily a problem for users.

[MLIR][Doc] Add documentation for OpAsmAttr/TypeInterface (#140244)

After the introduction of OpAsmAttr/TypeInterface in #121187 #124721,
the documentation for them could be updated along side the doc for
OpAsmDialectInterface.

[mlir][tosa] Allow creation of reshape with unranked output (#140617)

This commit allows reshape to be created with an unranked output,
allowing it to be inferred by the shape inference pass.

[AArch64] Split AArch64ISD::COND_SMSTART/STOP off AArch64::SMSTART/STOP (NFC) (#140711)

The conditional variants of SMSTART/STOP currently take the current
PStateSM as a variadic value. This is not supported by the verification
added in #140472 (which requires variadic values to be of type Register
or RegisterMask), so this patch splits the the conditional variants into
new `COND_` nodes, where these extra parameters are fixed arguments.

Suggested in
https://github.com/llvm/llvm-project/pull/140472#discussion_r2094635066

Part of #140472.

[libclc][NFC] Reuse inc file for OpenCL frexp decl

[flang][OpenMP] fix diagnostic for bad cancel type (#140798)

Fixes #133685

[AArch64] Remove unused ISD nodes (NFC) (#140706)

Part of #140472.

[libclc] Move all remquo address spaces to CLC library (#140871)

Previously the OpenCL address space overloads of remquo would call into
the one and only 'private' CLC remquo. This was an outlier compared with
the other pointer-argumented maths builtins.

This commit moves the definitions of all address space overloads to the
CLC library to give more control over each address space to CLC
implementers.

There are some minor changes to the generated bytecode but it's simply
moving IR instructions around.

[C] Don't diagnose null pointer macros in -Wimplicit-void-ptr-cast (#140724)

This silences the diagnostic when the right-hand side is a null pointer
constant that comes from a macro expansion, such as NULL. However, we do
not limit to just NULL because other custom macros may expand to an
implicit void * cast in C while expanding to something else in C++.

[mlir][memref][nfc] push early-exit to earlier (#140730)

Move early exit check to as early as possible,

[email protected]

[NFC] Ubsan a few corner cases for `=sanitize` (#140855)

[LAA] Tweak debug output for UTC stability (#140764)

UpdateTestChecks has a make_analyzer_generalizer to replace pointer
addressess from the debug output of LAA with a pattern, which is an
acceptable solution when there is one RUN line. However, when there are
multiple RUN lines with a common pattern, UTC fails to recognize common
output due to mismatched pointer addresses. Instead of hacking UTC scrub
the output before comparing the outputs from the different RUN lines,
fix the issue once and for all by making LAA not output unstable pointer
addresses in the first place.

The removal of the now-dead make_analyzer_generalizer is left as a
non-trivial exercise for a follow-up.

[analyzer] Add previous CFG block to BlockEntrance ProgramPoints (#140861)

This helps to gain contextual information about how we entered a CFG block.

The `noexprcrash.c` test probably changed due to the fact that now
BlockEntrance ProgramPoint Profile also hashes the pointer of the
previous CFG block. I didn't investigate.

CPP-6483

[X86] lowerV8F32Shuffle - use lowerShufflePairAsUNPCKAndPermute on AVX1 targets (#140881)

If we're not going to split the v8f32 shuffle anyway, attempt to match with lowerShufflePairAsUNPCKAndPermute

[SPIRV] Addition of matrix multiply accumulate operands  (#138665)

--Added Matrix multiply accumulate operands for the extension
SPV_INTEL_subgroup_matrix_multiply_accumulate

InferAddressSpaces: Stop trying to insert pointer bitcasts (#140873)

[X86] combineINSERT_SUBVECTOR - simplify aligned index assertion to avoid signed/unsigned warning. NFC.

[utils][TableGen] Clean up code in DirectiveEmitter (#140772)

Remove most redundant function calls. Unify enum identifier name
generation (via getIdentifierName), and namespace qualification (via
getQualifier).

[OpenACC] rename private/firstprivate recipe attributes (#140719)

Make private and firstprivate recipe attribute names consistent with
reductionRecipes attribute

[mlir][XeGPU] Add XeGPU Workgroup to Subgroup Distribution Pass  (#140805)

This PR adds the XeGPU workgroup (wg) to subgroup (sg) pass. The wg to
sg pass transforms the xegpu wg level operations to subgroup operations
based on the sg_layout and sg_data attribute. The PR adds transformation
patterns for following Ops

1. CreateNdDesc
2. LoadNd
3. StoreNd
4. PrefetchNd
5. UpdateNdOffset
6. Dpas

[LLVM][TableGen] Use StringRef for various members `CGIOperandList::OperandInfo` (#140625)

- Change `Name`, `SubopNames`, `PrinterMethodName`, and
`EncoderMethodNames` to be stored as StringRef.
- Also changed `CheckComplexPatMatcher::Name` to StringRef as a fallout
from the above.

Verified that all the tablegen generated files within LLVM are
unchanged.

[LLVM][IR] Replace `unsigned >= ConstantDataFirstVal` with static_assert (#140827)

`ConstantDataFirstVal` is 0, so `getValueID() >= ConstantDataFirstVal`
leads to a compiler warning that the expression is always true. Replace
such comparisons with a static_assert() to verify that
`ConstantDataFirstVal` is 0, similar to the existing code in Value.h

[NFC][Support] Apply clang-format to regcomp.c (#140769)

Apply clang-format to regcomp.c since it's not conformant and leads to
clang-format failures when doing individual changes to this file (for
example in https://github.com/llvm/llvm-project/pull/140758). File
generated by running `clang-format -i regcomp.c`

[flang] add -floop-interchange and enable it with opt levels (#140182)

Enable the use of -floop-interchange from the flang driver.
Enable in flang LLVM's loop interchange at levels -O2, -O3, -Ofast, and -Os.

[AMDGPU] PromoteAlloca: handle out-of-bounds GEP for shufflevector (#139700)

This LLVM defect was identified via the AMD Fuzzing project.

---------

Co-authored-by: Matt Arsenault <[email protected]>

[flang] fix ICE with ignore_tkr(tk) character in explicit interface (#140885)

Some MPI libraries use character dummies + ignore(TKR) to allow passing
any kind of buffer.

This was meant to already be handled by #108168
However, when the library interface also had an argument requiring an
explicit interface, `builder.convertWithSemantics` was not allowed to properly deal
with the actual/dummy type mismatch and generated bad IR causing errors like:
`'fir.convert' op invalid type conversion'!fir.ref' / '!fir.boxchar\<1\>'`.

This restriction was artificial, lowering should just handle any cases
allowed by semantics. Just remove it.

[Clang] Set the final date for workaround for libstdc++'s `format_kind` (#140831)

We can use 20250520 as the final date, see the following commits.
- GCC releases/gcc-15 branch:
  - https://gcc.gnu.org/g:fedf81ef7b98e5c9ac899b8641bb670746c51205
  - https://gcc.gnu.org/g:53680c1aa92d9f78e8255fbf696c0ed36f160650
- GCC master branch:
  - https://gcc.gnu.org/g:9361966d80f625c5accc25cbb439f0278dd8b278
  - https://gcc.gnu.org/g:c65725eccbabf3b9b5965f27fff2d3b9f6c75930

Follows-up #139560.

[llvm-debuginfo-analyzer] Support DW_TAG_module (#137228)

- Adds support for `DW_TAG_module` DIEs and recurse over their children.
Prior to this patch, entities hanging below `DW_TAG_module` were just
not visible. This DIE kind is commonly generated by Objective-C modules.

This patch will represent such entities, which will print as
```
[001]    {CompileUnit} '/llvm/tools/clang/test/modules/<stdin>'
[002]      {Producer} 'LLVM version 3.7.0'
           {Directory} '/llvm/tools/clang/test/modules'
           {File} '<stdin>'
[002]      {Module} 'DebugModule'
```
The minimal test case included is just the result of
```
$ llc llvm/test/DebugInfo/X86/DIModule.ll
      -accel-tables=Dwarf
      -o llvm/unittests/DebugInfo/LogicalView/Inputs/test-dwarf-clang-module.o
      -filetype=obj
```

[clang][Sema] Declare builtins used in #pragma intrinsic (#138205)

When trying to remove the usage of `__has_builtin` on MSVC CUDA ARM for
some builtins, the recommended direction was to universally declare the
MSVC builtins on all platforms and require the header providing
declarations to be included. This was done
[here](https://github.com/llvm/llvm-project/pull/128222).

However, some MSVC headers already use the MSVC builtins without
including the header, so we introduce a warning for anyone compiling
with MSVC for this target, so the above change had to be reverted.

The MSVC headers use `#pragma intrinsic` before the intrinsic uses and
that seems to be enough for MSVC, so declare builtins when used in
`#pragma intrinsic` in Clang to prevent the warning.

---------

Signed-off-by: Sarnie, Nick <[email protected]>

[clang-include-cleaner] Make cleanup attr report expr location (#140233)

Instead of reporting the location of the attribute, let's report the
location of the function reference that's passed to the cleanup
attribute as the first argument. This is required as the attribute might
be coming from a macro which means clang-include-cleaner skips the use
as it gets attributed to the header file declaringt the macro and not to
the main file.

To make this work, we have to add a fake argument to the CleanupAttr
constructor so we can pass in the original Expr alongside the function
declaration.

Fixes #140212

[clang-tidy] Add UnusedIncludes/MissingIncludes options to misc-include-cleaner (#140600)

These mimick the same options from clangd and allow using the check to
only check for unused includes or missing includes.

[clang-tools-extra] Add include mappings for getopt.h (#140726)

[VPlan] Move predication to VPlanTransform (NFC). (#128420)

This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: https://github.com/llvm/llvm-project/pull/128420

AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900)

This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.

Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).

[Vectorize] Fix a warning

This patch fixes:

  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8564:20: error:
  unused variable 'LoopRegionOf' [-Werror,-Wunused-variable]

[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)

Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.

- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)

The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:

* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)

[gn build] Port b263c08e1a0b

[RISCV] Add MC layer support for XSfmm*. (#133031)

This adds assembler/disassembler support for XSfmmbase 0.6 and related
SiFive matrix multiplication extensions based on the spec here
https://www.sifive.com/document-file/xsfmm-matrix-extensions-specification

Functionality-wise, this is the same as the Zvma extension proposal that
SiFive shared with the Attached Matrix Extension Task Group. The
extension names and instruction mnemonics have been changed to use
vendor prefixes.

Note this is a non-conforming extension as the opcodes used here are in
the standard opcode space in OP-V or OP-VE.

---------

Co-authored-by: Brandon Wu <[email protected]>

[InstCombine] Enable more fabs fold when the user ignores sign bit of zero/NaN (#139861)

When the only user of select is a fcmp or a fp operation with nnan/nsz,
the sign bit of zero/NaN can be ignored.
Alive2: https://alive2.llvm.org/ce/z/ZcxeIv

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=7add1bcd02b1f72d580bb2e64a1fe4a8bdc085d9&to=cb419c7cbddce778673f3d4b414ed9b8064b8d6e&stat=instructions:u

Closes https://github.com/llvm/llvm-project/issues/133367.

[SCCPSolver] Make getMRVFunctionsTracked return a reference (NFC) (#140851)

This patch makes getMRVFunctionsTracked return a reference.
runIPSCCP, the sole user of getMRVFunctionsTracked, just needs a
read-access to the map.

The missing "&" is most likely an oversight as two "sibling" functions
getTrackedRetVals and getTrackedGlobals return maps by const
reference.

[libc++] Optimize std::for_each_n for segmented iterators (#135468)

This patch enhances the performance of `std::for_each_n` when used with
segmented iterators, leading to significant performance improvements,
summarized in the tables below. This addresses a subtask of
https://github.com/llvm/llvm-project/issues/102817.

[CIR] Add support for recursive record layouts (#140811)

While processing members of a record, we try to create new record types
as we encounter them, but if this would result in recursion (either
because the type points to itself or because it points to a type that
points back to the original type) we need to add it to a list for
deferred processing. Previously, we issued an error saying this wasn't
handled. This change adds the necessary handling.

[libc++] Optimize bitset::to_string (#128832)

This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient
bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently
locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process,
especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we
exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even
for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing
implementation.

Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with
`Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed
`bitset`s with `Pr(true bit) = 0.5)`.

[ELF] Error if a section address is smaller than image base

When using `-no-pie` without a `SECTIONS` command, the linker uses the
target's default image base. If `-Ttext=` or `--section-start` specifies
an output section address below this base, the result is likely
unintended.

- With `--no-rosegment`, the PT_LOAD segment covering the ELF header cannot include `.text` if `.text`'s address is too low, causing an `error: output file too large`.
- With default `--rosegment`:
  - If a read-only section (e.g., `.rodata`) exists, a similar `error: output file too large` occurs.
  - Without read-only sections, the PT_LOAD segment covering the ELF header and program headers includes no sections, which is unusual and likely undesired. This also causes non-ascending PT_LOAD `p_vaddr` values related to the PT_LOAD that overlaps with PT_PHDR (#138584).

To prevent these issues, report an error if a section address is below
the image base and suggest `--image-base`. This check also applies when
`--image-base` is explicitly set but is skipped when a `SECTIONS`
command is used.

Pull Request: https://github.com/llvm/llvm-project/pull/140187

Add live in for PrivateSegmentSize in GISel path (#139968)

[clang][TableGen] Fix Duplicate Entries in TableGen (#140828)

Fixed TableGen duplicate issues that causes the wrong interrupt
attribute from being selected.

resolves #140701

[gn build] Port 09c266b75db4

[KeyInstr][Clang] Add ApplyAtomGroup (#134632)

This is a scoped helper similar to ApplyDebugLocation that creates a new source
location atom group which instructions can be added to.

A source atom is a source construct that is "interesting" for debug stepping
purposes. We use an atom group number to track the instruction(s) that implement
the functionality for the atom, plus backup instructions/source locations.

This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.

[CIR][NFC] Fix an unused variable warning (#140783)

This fixes a warning where a variable assigned in 'if' statement wasn't
referenced again, and where else is used when 'if' has returns statement
in the if-else statement

[CIR][LLVMLowering] Upstream Bitcast lowering (#140774)

This change adds support for lowering BitCastOp

Reduce llvm-gsymutil memory usage (#140740)

Same as https://github.com/llvm/llvm-project/pull/139907/ except there
is now a special dovoidwork helper function.
Previous approach with assert(f();return success;) failed tests for
release builds, so I created a separate helper. Open to suggestions how
to solve this more elegantly.

Co-authored-by: Arslan Khabutdinov <[email protected]>

[libclc] Support the generic address space (#137183)

This commit provides definitions of builtins with the generic address
space.

One concept to consider is the difference between supporting the generic
address space from the user's perspective and the requirement for libclc
as a compiler implementation detail to define separate generic address
space builtins. In practice a target (like NVPTX) might notionally
support the generic address space, but it's mapped to the same LLVM
target address space as another address space (often the private one).

In such cases libclc must be careful not to define both private and
generic overloads of the same builtin. We track these two concepts
separately, and make the assumption that if the generic address space
does clash with another, it's with the private one. We track the
concepts separately because there are some builtins such as atomics that
are defined for the generic address space but not the private address
space.

Fix-forward excess ';' from 9459c8309c6768cf6aa7956885b2540e16582a93 (#134632)

clang/lib/CodeGen/CGDebugInfo.cpp:153:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
  153 | };
      |  ^
1 error generated.

[lldb][lldb-dap][tests] Make sure evaluate test exists with no errors. (#140788)

[AMDGPU] Fix scale opsel flags for scaled MFMA operations (#140183)

Fix for src scale opsel flags encoding and ASM parsing for gfx950 scaled MFMA.

[OpenACC] Stop trying to analyze invalid Var-Decls.

The code to analyze VarDecls for the purpose of ensuring a magic-static
isn't present in a 'routine' was getting confused/crashed because we
create something that looks like a magic-static during error-recovery,
but it is still an invalid decl.

This patch causes us to just 'give up' in the case where the vardecl is
already invalid.

Fixes: #140920

 [RISCV] Support scalable vectors for the zvqdotq lowering paths (#140922)

This was an oversight in the original patch series. Without this change,
the newly added tests fail assertions.

Add macro to suppress -Wunnecessary-virtual-specifier (#139614)

Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.

[flang] [cuda] implicitly set DEVICE attribute to scalars in device routines (#140834)

Scalars inside device routines also need to implicitly set the DEVICE
attribute, except for function results.

[RISCV] Expand zvqdotq partial.reduce test variants

Make sure to cover all the scalable types which are legal, plus
splitting.  Make sure to cover all instructions.  Not duplicating
vx testing at this time.

Revert "[VPlan] Move predication to VPlanTransform (NFC). (#128420)"

This reverts commit b263c08e1a0b54a871915930aa9a1a6ba205b099.

Looks like this triggers a crash in one of the Fortran tests. Reverting
while I investigate
    https://lab.llvm.org/buildbot/#/builders/41/builds/6825

[RISCV] Remove nsw/nuw from zvqdotq tests [nfc]

As noted in review comment https://github.com/llvm/llvm-project/pull/140922#discussion_r2100838209, this aren't required

Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)"

This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936.

It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).

[CIR] Upstream support for string literals (#140796)

This adds the minimal support needed to handle string literals.

[NVPTX] Remove Float register classes (#140487)

These classes are redundant, as the untyped "Int" classes can be used
for all float operations. This change is intended to be as minimal as
possible and leaves the many potential simplifications and refactors
this exposes as future work.

[GlobalISel] Fix ZExt known bits for scalable vectors. (#140213)

It was using the full size of the vector as the SrcBitWidth. This patch
changes the code to split G_ASSERT_ZEXT away from the others (G_INTTOPTR
/ G_PTRTOINT / G_ZEXT / G_TRUNC) which are simpler, and make the code
match the SDAG equivalent.

[lldb] Add templated CompilerType::GetTypeSystem (NFC) (#140424)

Add an overloaded `GetTypeSystem` to specify the expected type system subclass. Changes code from  `GetTypeSystem().dyn_cast_or_null<TypeSystemClang>()` to `GetTypeSystem<TypeSystemClang>()`.

[X86] combineINSERT_SUBVECTOR - use concatSubVectors instead of direct fold to X86ISD::SUBV_BROADCAST_LOAD (#140919)

Use common helper and try to reduce the number of places we're
generating load node directly.

[TargetLowering] Use getExtractSubvector/getExtractVectorElt. NFC

[lldb-dap] assembly breakpoints (#139969)

* Support assembly source breakpoints
* Change `sourceReference` to be the symbol load address for simplicity
and consistency across threads/frames

[Screencast From 2025-05-17
23-57-30.webm](https://github.com/user-attachments/assets/2e7c181d-42c1-4121-8f13-b180c19d0e33)

[gn build] Port 793bb6b257fa

[mlir] Translate nested debug information (#140915)

This backports changes from Triton with the exception that for fused
locations, use the first one with file info rather than just first.

---------

Co-authored-by: Sergei Lebedev <[email protected]>
Co-authored-by: Keren Zhou <[email protected]>

[HLSL] Update Sema Checking Diagnostics for builtins (#138429)

Update how Sema Checking is done for HLSL builtins to allow for better
error messages, mainly using 'err_builtin_invalid_arg_type'.
Try to follow the formula outlined in issue #134721
Closes #134721

[flang][cuda] Use NVVM op for barrier0 intrinsic (#140947)

The simple form of `Barrier0Op` is available in the NVVM dialect. It is
needed to use it instead of the string version since
https://github.com/llvm/llvm-project/pull/140615

[NFC][ADT/Support] Add {} for else when if body has {} (#140758)

[CIR] Improve NYI message for emitCompoundStmtWithoutScope (#140945)

This improves the error emitting for unhandled compound statements
without scope by reporting the statement class that wasn't handled.

[RISCV] Add tests for widening fixed vector masked loads/stores. NFC (#140949)

[mlir][ROCDL] Add fp4 and fp6 conversion intrinsics, fix fp8 immargs (#140801)

This PR adds support for the scaled conversion intrinsics for fp4 and
fp6 types so that they can be targetted by a future amdgpu dialect op or
used directly.

Additionally, this patch refactors the copy-paste-heavy fp8 versions of
these scaled conversion intrinsics with tablegen `foreach` loops, and
fixes the fact that certain immargs weren't being stored as attributes.

Note that some of the MLIR-level tests for those scaled fp8 intrinsics
had incorrect return types, which have been fixed.

(Note that while the operations have a known return type, the IR format
still prints that type for clarity).

[mlir][Vector][NFC] Run `extractInsertFoldConstantOp` earlier in the folder (#140814)

This PR moves `extractInsertFoldConstantOp` earlier in the folder lists
of `vector.extract` and `vector.insert`. Many folders require having
non-dynamic indices so `extractInsertFoldConstantOp` is a requirement
for them to trigger.

[SCCPSolver] Mark several functions const (NFC) (#140926)

[VPlan] Don't try to narrow predicated VPReplicateRecipe.

We cannot convert predicated recipes to uniform ones at the moment.
This fixes a crash reported for https://github.com/llvm/llvm-project/pull/139150.

[LoopPeel] Add test for peeling last iteration with non-trivial BTC.

Additional test to https://github.com/llvm/llvm-project/pull/140792 with
different SCEV expansion costs.

[HLSL][RootSignature] Add parsing for empty RootDescriptors (#140147)

- define the RootDescriptor in-memory struct containing its type
- add test harness for testing

First part of https://github.com/llvm/llvm-project/issues/126577

[llvm] add GenericFloatingPointPredicateUtils (#140254)

add `GenericFloatingPointPredicateUtils` in order to generalize
effects of floating point comparisons on `KnownFPClass` for both IR and
MIR.

---------

Co-authored-by: Matt Arsenault <[email protected]>

[AMDGPU][True16][CodeGen] select vgpr16 for asm inline 16bit vreg (#140946)

select vgpr16 for asm inline 16bit vreg in true16 mode

[gn build] Port d00d74bb2564

[RISCV][TTI] Add test coverage for getPartialReductionCost [nfc]

Adding testing in advance of a change to cost the zvqdotq instructions
such that we emit them from LV.

[LLVM] Use `reportFatalUsageError` for LTO usage errors (#140955)

Usage errors in `LTOBackend.cpp` were previously, misleadingly, reported
as internal crashes.

This PR updates `LTOBackend.cpp` to use `reportFatalUsageError` for
reporting usage-related issues.

LLVM Issue: https://github.com/llvm/llvm-project/issues/140953
Internal Tracker: TOOLCHAIN-17744

[SelectionDAG][RISCV] Use VP_LOAD to widen MLOAD in type legalization when possible. (#140595)

Padding the mask using 0 elements doesn't work for scalable vectors. Use
VP_LOAD and change the VL instead.

This fixes crash for Zve32x. Test file was split since i64 isn't a valid
element type for Zve32x.

Fixes #140198.

Revert "[llvm] add GenericFloatingPointPredicateUtils (#140254)" (#140968)

This reverts commit d00d74bb2564103ae3cb5ac6b6ffecf7e1cc2238.

The PR breaks our buildbots and blocks downstream merge.

[gn build] Port c47a5fbb229b

[mlir][Vector] Move `vector.mask` canonicalization to folder (#140324)

This MR moves the canonicalization that elides empty `vector.mask` ops
to folders.

[OpenMP][Flang] Fix OOB access for derived type mapping (#140948)

[lldb] Skip TestConsecutiveWatchpoints.py if out of tree debugserver

The GreenDragon CI bots are currently passing because the installed
Xcode is a bit old, and doesn't have the watchpoint handling
bug that was fixed April with this test being added.

But on other CI running newer Xcode debugservers, this test will
fail.  Skip this test if we're using an out of tree debugserver.

Revert #140650 and #140505 (#140973)

This reverts commit 90daed32a82ad2695d27db285ac36f579f2b270e and 4cfbe55781cb8fb95568c9a8538912f68d2ff681.
These changes exposed cyclic dependencies when LLVM is configured with
modules `-DLLVM_ENABLE_MODULES=ON`.

[RISCV] Correct operand names for vmv.s.x and vfmv.s.f pseudos. NFC (#140970)

[AMDGPU] Fix computation of waves/EU maximum (#140921)

This fixes an issue in the waves/EU range calculation wherein, if the
`amdgpu-waves-per-eu` attribute exists and is valid, the entire
attribute may be spuriously and completely ignored if workgroup sizes
and LDS usage restrict the maximum achievable occupancy below the
subtarget maximum. In such cases, we should still honor the requested
minimum number of waves/EU, even if the requested maximum is higher than
the actually achievable maximum (but still within subtarget
specification).

As such, the added unit test `empty_at_least_2_lds_limited`'s waves/EU
range should be [2,4] after this patch, when it is currently [1,4] (i.e,
as if `amdgpu-waves-per-eu` was not specified at all).

Before e377dc4 the default maximum waves/EU was always set to the
subtarget maximum, trivially avoiding the issue.

[SelectionDAG] Simplify creation of getStoreVP in WidenVecOp_STORE. NFC

We can use the offset from the original store instead of creating
a new undef offset.

We didn't check if the offset was undef already so we really shouldn't
drop it if it isn't.

[RISCV] Add Andes A25/AX25 processor definition (#140681)

Andes A25/AX25 are 32/64bit, 5-stage pipeline, linux-capable CPUs that
implement the RV[32|64]IMAFDC_Zba_Zbb_Zbc_Zbs ISA extensions. They are
developed by Andes Technology https://www.andestech.com, a RISC-V IP
provider.

The overviews for A25/AX25:
https://www.andestech.com/en/products-solutions/andescore-processors/riscv-a25/
https://www.andestech.com/en/products-solutions/andescore-processors/riscv-ax25/

Scheduling model will be implemented in a later PR.

Revert "[Clang] Fix missed initializer instantiation bug for variable templates" (#140930)

Reverts llvm/llvm-project#138122

The patch causes a regression and prevents compiling valid C++ code.
The code was accepted by earlier versions of clang and GCC.
See https://github.com/llvm/llvm-project/issues/140773 for details.

[test] Fix dissassemble-entry-point.s for #140187 (#140978)

similar to #140570

getting this error:

exit status 1
ld.lld: error: section '.text' address (0x8074) is smaller than image
base (0x10000); specify --image-base

[clang] Mark some language options as benign. (#131569)

I'm fairly certain that the options in this CL are benign, as I don't
believe they affect the AST.
* RTTI - shouldn't affect the AST, should only affect codegen
* Trivial var init - also should only affect codegen
* Stack protector - also codegen
* Exceptions - Since exceptions do allow new things in the AST, but I'm
pretty sure that they can differ in parent and child safely, I marked it
as compatible instead.

I welcome any input from someone more familiar with this than me, as I
might be wrong.

[clang-format][NFC] Minor efficiency cleanup (#140835)

[RISCV] Add Xqcibi Select_GPR_Using_CC_<Imm> Pseudos to isSelectPseudo (#140698)

Not adding them was leading to a crash when trying to expand these
pseudo instructions.

I've also fixed the register class types for the Xqcibi instructions in
these pseudo instructions which was incorrect and was exposed by the
machine verifier while running the test case added in this patch.

Fixes #140697

[ConstraintElim] Do not allow overflows in `Decomposition` (#140541)

Consider the following case:
```
define i1 @pr140481(i32 %x) {
  %cond = icmp slt i32 %x, 0
  call void @llvm.assume(i1 %cond)
  %add = add nsw i32 %x, 5001000
  %mul1 = mul nsw i32 %add, -5001000
  %mul2 = mul nsw i32 %mul1, 5001000
  %cmp2 = icmp sgt i32 %mul2, 0
  ret i1 %cmp2
}
```
Before this patch, `decompose(%mul2)` returns `-25010001000000 * %x +
4052193514966861312`.
Therefore, `%cmp2` will be simplified into true because `%x s< 0 &&
-25010001000000 * %x + 4052193514966861312 s<= 0` is unsat.

It is incorrect since the offset `-25010001000000 * 5001000 ->
4052193514966861312` signed wraps.
This patch treats a decomposition as invalid if overflows occur when
computing coefficients.

Closes https://github.com/llvm/llvm-project/issues/140481.

[clang] Use llvm::find_if (NFC) (#140983)

[BOLT] Use llvm::is_contained (NFC) (#140984)

[mlir] Use llvm::is_contained (NFC) (#140986)

[BOLT] Avoid creating a temporary instance of std::string (NFC) (#140987)

lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create a temporary instance of std::string on our own.

[IA] Add support for [de]interleave{3,5,7} (#139373)

This adds support for lowering deinterleave and interleave intrinsics
for factors 3 5 and 7 into target specific memory intrinsics.

Notably this doesn't add support for handling higher factors constructed
from interleaving interleave intrinsics, e.g. factor 6 from interleave3
+ interleave2.

I initially tried this but it became very complex very quickly. For
example, because there's now multiple factors involved
interleaveLeafValues is no longer symmetric between interleaving and
deinterleaving. There's then also two ways of representing a factor 6
deinterleave: It can both be done as either 1 deinterleave3 and 3
deinterleave2s OR 1 deinterleave2 and 3 deinterleave3s.

I'm not sure the complexity of supporting arbitrary factors is warranted
given how we only need to support a small number of factors currently:
SVE only needs factors 2,3,4 whilst RVV only needs 2,3,4,5,6,7,8.

My preference would be to just add a interleave6 and deinterleave6
intrinsic to avoid all this ambiguity, but I'll defer this discussion to
a later patch.

[clang] Avoid creating temporary instances of std::string (NFC) (#140988)

lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create temporary instances of std::string on our own.

[lldb] Remove unused local variables (NFC) (#140989)

[mlir] Remove unused local variables (NFC) (#140990)

Revert "[LLVM] Use `reportFatalUsageError` for LTO usage errors" (#141000)

The PR causes check-lld fail:
>TEST 'lld :: COFF/lto-cache-errors.ll'

Tested on local revert and pass the check.

Reverts llvm/llvm-project#140955

Fix regression tests with bad FileCheck checks (#140373)

Fixes https://github.com/llvm/llvm-project/issues/140149

[RISCV] Use print-enabled-extensions to check the extensions of Andes n45/nx45/a45/ax45 cpus. NFC. (#140979)

Similarly to what #137725 did for the SiFive P870.

[test] Improve linker-relaxable fixups tests

The behavior will change once the assembler improves (#140692)

[CMake] respect LLVMConfig.cmake's LLVM_DEFINITIONS in standalone builds (#138587)

In #138329, _GNU_SOURCE was added for Cygwin, but when building Clang
standalone against an installed LLVM this definition was not picked up,
resulting in undefined strnlen. Follow the documentation in
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project and add
the LLVM_DEFINITIONS in standalone projects' cmakes.

[LLVM][Cygwin] add workaround for blocking connect/accept in AF_UNIX sockets (#140353)

On Cygwin, UNIX sockets involve a handshake between connect and accept
to enable SO_PEERCRED/getpeereid handling. This necessitates accept
being called before connect can return, but at least the tests in
llvm/unittests/Support/raw_socket_stream_test do both on the same thread
(first connect and then accept), resulting in a deadlock. Add a call to
both places sockets are created that turns off the handshake (and
SO_PEERCRED/getpeereid support).

References:
* https://github.com/cygwin/cygwin/blob/cec8a6680ea1fe38f38001b06c34ae355a785209/winsup/cygwin/fhandler/socket_local.cc#L1462-L1471
* https://inbox.sourceware.org/cygwin/[email protected]/T/#u

[MC] Restore MCAsmBackend::shouldForceRelocation to false

Revert the Target.getSpecifier implementation
(38c3ad36be…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well mc Machine (object) code platform:windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants