Skip to content

[WebAssembly] Add assembly support for final EH proposal #107917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 11, 2024

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Sep 9, 2024

This adds the basic assembly generation support for the final EH proposal, which was newly adopted in Sep 2023 and advanced into Phase 4 in Jul 2024:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md

This adds support for the generation of new try_table and throw_ref instruction in .s asesmbly format. This does NOT yet include

  • Block annotation comment generation for .s format
  • .o object file generation
  • .s assembly parsing
  • Type checking (AsmTypeCheck)
  • Disassembler
  • Fixing unwind mismatches in CFGStackify

These will be added as follow-up PRs.


The format for TRY_TABLE, both for MachineInstr and MCInst, is as follows:

TRY_TABLE type number_of_catches catch_clauses*

where catch_clause is

catch_opcode tag+ destination

catch_opcode should be one of 0/1/2/3, which denotes CATCH/CATCH_REF/CATCH_ALL/CATCH_ALL_REF respectively. (See BinaryFormat/Wasm.h) tag exists when the catch is one of CATCH or CATCH_REF.
The MIR format is printed as just the list of raw operands. The (stack-based) assembly instruction supports pretty-printing, including printing catch clauses by name, in InstPrinter.

In addition to the new instructions TRY_TABLE and THROW_REF, this adds four pseudo instructions: CATCH, CATCH_REF, CATCH_ALL, and CATCH_ALL_REF. These are pseudo instructions to simulate block return values of catch, catch_ref, catch_all, catch_all_ref clauses in try_table respectively, given that we don't support block return values except for one case (fixEndsAtEndOfFunction in CFGStackify). These will be omitted when we lower the instructions to MCInst at the end.

LateEHPrepare now will have one more stage to covert CATCH/CATCH_ALLs to CATCH_REF/CATCH_ALL_REFs when there is a RETHROW to rethrow its exception. The pass also converts RETHROWs into THROW_REF. Note that we still use RETHROW as an interim pseudo instruction until we convert them to THROW_REF in LateEHPrepare.

CFGStackify has a new placeTryTableMarker function, which places try_table/end_try_table markers with a necessary catch clause and also block/end_block markers for the destination of the catch clause.

In MCInstLower, now we need to support one more case for the multivalue block signature (catch_ref's destination's (i32, exnref) return type).

InstPrinter has a new routine to print the catch_list type, which is used to print try_table instructions.

The new test, exception.ll's source is the same as exception-legacy.ll, with the FileCheck expectations changed. One difference is the commands in this file have -wasm-enable-exnref to test the new format, and don't have -wasm-disable-explicit-locals -wasm-keep-registers, because the new custom InstPrinter routine to print catch_list only works for the stack-based instructions (_S), and we can't use -wasm-keep-registers for them.

As in exception-legacy.ll, the FileCheck lines for the new tests do not contain the whole program; they mostly contain only the control flow instructions for readability.

This adds the basic assembly generation support for the final EH
proposal, which was newly adopted in Sep 2023 and advanced into Phase 4
in Jul 2024:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md

This adds support for the generation of new `try_table` and `throw_ref`
instruction in .s asesmbly format. This does NOT yet include
- Block annotation comment generation for .s format
- .o object file generation
- .s assembly parsing
- Type checking (AsmTypeCheck)
- Disassembler
- Fixing unwind mismatches in CFGStackify

These will be added as follow-up PRs.

---

The format for `TRY_TABLE`, both for `MachineInstr` and `MCInst`, is as
follows:
```
TRY_TABLE type number_of_catches catch_clauses*
```
where `catch_clause` is
```
catch_opcode tag+ destination
```
`catch_opcode` should be one of 0/1/2/3, which denotes
`CATCH`/`CATCH_REF`/`CATCH_ALL`/`CATCH_ALL_REF` respectively. (See
`BinaryFormat/Wasm.h`) `tag` exists when the catch is one of `CATCH` or
`CATCH_REF`.
The MIR format is printed as just the list of raw operands. The
(stack-based) assembly instruction supports pretty-printing, including
printing `catch` clauses by name, in InstPrinter.

In addition to the new instructions `TRY_TABLE` and `THROW_REF`, this
adds four pseudo instructions: `CATCH`, `CATCH_REF`, `CATCH_ALL`, and
`CATCH_ALL_REF`. These are pseudo instructions to simulate block return
values of `catch`, `catch_ref`, `catch_all`, `catch_all_ref` clauses in
`try_table` respectively, given that we don't support block return
values except for one case (`fixEndsAtEndOfFunction` in CFGStackify).
These will be omitted when we lower the instructions to `MCInst` at the
end.

LateEHPrepare now will have one more stage to covert
`CATCH`/`CATCH_ALL`s to `CATCH_REF`/`CATCH_ALL_REF`s when there is a
`RETHROW` to rethrow its exception. The pass also converts `RETHROW`s
into `THROW_REF`. Note that we still use `RETHROW` as an interim pseudo
instruction until we convert them to `THROW_REF` in LateEHPrepare.

CFGStackify has a new `placeTryTableMarker` function, which places
`try_table`/`end_try_table` markers with a necessary `catch` clause and
also `block`/`end_block` markers for the destination of the `catch`
clause.

In MCInstLower, now we need to support one more case for the multivalue
block signature (`catch_ref`'s destination's `(i32, exnref)` return
type).

InstPrinter has a new routine to print the `catch_list` type, which is
used to print `try_table` instructions.

The new test, `exception.ll`'s source is the same as
`exception-legacy.ll`, with the FileCheck expectations changed. One
difference is the commands in this file have `-wasm-enable-exnref` to
test the new format, and don't have `-wasm-disable-explicit-locals
-wasm-keep-registers`, because the new custom InstPrinter routine to
print `catch_list` only works for the stack-based instructions (`_S`),
and we can't use `-wasm-keep-registers` for them.

As in `exception-legacy.ll`, the FileCheck lines for the new tests do
not contain the whole program; they mostly contain only the control flow
instructions for readability.
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2024

@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-llvm-binary-utilities

Author: Heejin Ahn (aheejin)

Changes

This adds the basic assembly generation support for the final EH proposal, which was newly adopted in Sep 2023 and advanced into Phase 4 in Jul 2024:
https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md

This adds support for the generation of new try_table and throw_ref instruction in .s asesmbly format. This does NOT yet include

  • Block annotation comment generation for .s format
  • .o object file generation
  • .s assembly parsing
  • Type checking (AsmTypeCheck)
  • Disassembler
  • Fixing unwind mismatches in CFGStackify

These will be added as follow-up PRs.


The format for TRY_TABLE, both for MachineInstr and MCInst, is as follows:

TRY_TABLE type number_of_catches catch_clauses*

where catch_clause is

catch_opcode tag+ destination

catch_opcode should be one of 0/1/2/3, which denotes CATCH/CATCH_REF/CATCH_ALL/CATCH_ALL_REF respectively. (See BinaryFormat/Wasm.h) tag exists when the catch is one of CATCH or CATCH_REF.
The MIR format is printed as just the list of raw operands. The (stack-based) assembly instruction supports pretty-printing, including printing catch clauses by name, in InstPrinter.

In addition to the new instructions TRY_TABLE and THROW_REF, this adds four pseudo instructions: CATCH, CATCH_REF, CATCH_ALL, and CATCH_ALL_REF. These are pseudo instructions to simulate block return values of catch, catch_ref, catch_all, catch_all_ref clauses in try_table respectively, given that we don't support block return values except for one case (fixEndsAtEndOfFunction in CFGStackify). These will be omitted when we lower the instructions to MCInst at the end.

LateEHPrepare now will have one more stage to covert CATCH/CATCH_ALLs to CATCH_REF/CATCH_ALL_REFs when there is a RETHROW to rethrow its exception. The pass also converts RETHROWs into THROW_REF. Note that we still use RETHROW as an interim pseudo instruction until we convert them to THROW_REF in LateEHPrepare.

CFGStackify has a new placeTryTableMarker function, which places try_table/end_try_table markers with a necessary catch clause and also block/end_block markers for the destination of the catch clause.

In MCInstLower, now we need to support one more case for the multivalue block signature (catch_ref's destination's (i32, exnref) return type).

InstPrinter has a new routine to print the catch_list type, which is used to print try_table instructions.

The new test, exception.ll's source is the same as exception-legacy.ll, with the FileCheck expectations changed. One difference is the commands in this file have -wasm-enable-exnref to test the new format, and don't have -wasm-disable-explicit-locals -wasm-keep-registers, because the new custom InstPrinter routine to print catch_list only works for the stack-based instructions (_S), and we can't use -wasm-keep-registers for them.

As in exception-legacy.ll, the FileCheck lines for the new tests do not contain the whole program; they mostly contain only the control flow instructions for readability.


Patch is 56.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107917.diff

14 Files Affected:

  • (modified) llvm/include/llvm/BinaryFormat/Wasm.h (+8)
  • (modified) llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp (+9-1)
  • (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp (+41)
  • (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.h (+1)
  • (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h (+30)
  • (modified) llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTypeUtilities.h (+9-5)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp (+11)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (+297-25)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp (+4-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrControl.td (+34-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyLateEHPrepare.cpp (+73-3)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp (+17-2)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp (+2)
  • (added) llvm/test/CodeGen/WebAssembly/exception.ll (+470)
diff --git a/llvm/include/llvm/BinaryFormat/Wasm.h b/llvm/include/llvm/BinaryFormat/Wasm.h
index acf89885af6fdb..9b21d6d65c2a8e 100644
--- a/llvm/include/llvm/BinaryFormat/Wasm.h
+++ b/llvm/include/llvm/BinaryFormat/Wasm.h
@@ -144,6 +144,14 @@ enum : unsigned {
   WASM_OPCODE_I32_RMW_CMPXCHG = 0x48,
 };
 
+// Sub-opcodes for catch clauses in a try_table instruction
+enum : unsigned {
+  WASM_OPCODE_CATCH = 0x00,
+  WASM_OPCODE_CATCH_REF = 0x01,
+  WASM_OPCODE_CATCH_ALL = 0x02,
+  WASM_OPCODE_CATCH_ALL_REF = 0x03,
+};
+
 enum : unsigned {
   WASM_LIMITS_FLAG_NONE = 0x0,
   WASM_LIMITS_FLAG_HAS_MAX = 0x1,
diff --git a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
index 24a9ad67cfe042..5299e6ea06f0bd 100644
--- a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
+++ b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
@@ -45,7 +45,7 @@ namespace {
 /// WebAssemblyOperand - Instances of this class represent the operands in a
 /// parsed Wasm machine instruction.
 struct WebAssemblyOperand : public MCParsedAsmOperand {
-  enum KindTy { Token, Integer, Float, Symbol, BrList } Kind;
+  enum KindTy { Token, Integer, Float, Symbol, BrList, CatchList } Kind;
 
   SMLoc StartLoc, EndLoc;
 
@@ -99,6 +99,7 @@ struct WebAssemblyOperand : public MCParsedAsmOperand {
   bool isMem() const override { return false; }
   bool isReg() const override { return false; }
   bool isBrList() const { return Kind == BrList; }
+  bool isCatchList() const { return Kind == CatchList; }
 
   MCRegister getReg() const override {
     llvm_unreachable("Assembly inspects a register operand");
@@ -151,6 +152,10 @@ struct WebAssemblyOperand : public MCParsedAsmOperand {
       Inst.addOperand(MCOperand::createImm(Br));
   }
 
+  void addCatchListOperands(MCInst &Inst, unsigned N) const {
+    // TODO
+  }
+
   void print(raw_ostream &OS) const override {
     switch (Kind) {
     case Token:
@@ -168,6 +173,9 @@ struct WebAssemblyOperand : public MCParsedAsmOperand {
     case BrList:
       OS << "BrList:" << BrL.List.size();
       break;
+    case CatchList:
+      // TODO
+      break;
     }
   }
 };
diff --git a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp
index b85ed1d93593bd..903dbcd21ea967 100644
--- a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp
+++ b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.cpp
@@ -367,3 +367,44 @@ void WebAssemblyInstPrinter::printWebAssemblySignatureOperand(const MCInst *MI,
     }
   }
 }
+
+void WebAssemblyInstPrinter::printCatchList(const MCInst *MI, unsigned OpNo,
+                                            raw_ostream &O) {
+  unsigned OpIdx = OpNo;
+  const MCOperand &Op = MI->getOperand(OpIdx++);
+  unsigned NumCatches = Op.getImm();
+
+  auto PrintTagOp = [&](const MCOperand &Op) {
+    const MCSymbolRefExpr *TagExpr = nullptr;
+    const MCSymbolWasm *TagSym = nullptr;
+    assert(Op.isExpr());
+    TagExpr = dyn_cast<MCSymbolRefExpr>(Op.getExpr());
+    TagSym = cast<MCSymbolWasm>(&TagExpr->getSymbol());
+    O << TagSym->getName() << " ";
+  };
+
+  for (unsigned I = 0; I < NumCatches; I++) {
+    const MCOperand &Op = MI->getOperand(OpIdx++);
+    O << "(";
+    switch (Op.getImm()) {
+    case wasm::WASM_OPCODE_CATCH:
+      O << "catch ";
+      PrintTagOp(MI->getOperand(OpIdx++));
+      break;
+    case wasm::WASM_OPCODE_CATCH_REF:
+      O << "catch_ref ";
+      PrintTagOp(MI->getOperand(OpIdx++));
+      break;
+    case wasm::WASM_OPCODE_CATCH_ALL:
+      O << "catch_all ";
+      break;
+    case wasm::WASM_OPCODE_CATCH_ALL_REF:
+      O << "catch_all_ref ";
+      break;
+    }
+    O << MI->getOperand(OpIdx++).getImm(); // destination
+    O << ")";
+    if (I < NumCatches - 1)
+      O << " ";
+  }
+}
diff --git a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.h b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.h
index 8fd54d16409059..b499926ab82965 100644
--- a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.h
+++ b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyInstPrinter.h
@@ -47,6 +47,7 @@ class WebAssemblyInstPrinter final : public MCInstPrinter {
                                       raw_ostream &O);
   void printWebAssemblySignatureOperand(const MCInst *MI, unsigned OpNo,
                                         raw_ostream &O);
+  void printCatchList(const MCInst *MI, unsigned OpNo, raw_ostream &O);
 
   // Autogenerated by tblgen.
   std::pair<const char *, uint64_t> getMnemonic(const MCInst *MI) override;
diff --git a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
index 00f15e1db5e13a..e3a60fa4812d8f 100644
--- a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
+++ b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTargetDesc.h
@@ -87,6 +87,8 @@ enum OperandType {
   OPERAND_BRLIST,
   /// 32-bit unsigned table number.
   OPERAND_TABLE,
+  /// A list of catch clauses for try_table.
+  OPERAND_CATCH_LIST,
 };
 } // end namespace WebAssembly
 
@@ -119,6 +121,10 @@ enum TOF {
   // address relative the __table_base wasm global.
   // Only applicable to function symbols.
   MO_TABLE_BASE_REL,
+
+  // On a block signature operand this indicates that this is a destination
+  // block of a (catch_ref) clause in try_table.
+  MO_CATCH_BLOCK_SIG,
 };
 
 } // end namespace WebAssemblyII
@@ -462,6 +468,22 @@ inline bool isMarker(unsigned Opc) {
   case WebAssembly::TRY_S:
   case WebAssembly::END_TRY:
   case WebAssembly::END_TRY_S:
+  case WebAssembly::TRY_TABLE:
+  case WebAssembly::TRY_TABLE_S:
+  case WebAssembly::END_TRY_TABLE:
+  case WebAssembly::END_TRY_TABLE_S:
+    return true;
+  default:
+    return false;
+  }
+}
+
+inline bool isTry(unsigned Opc) {
+  switch (Opc) {
+  case WebAssembly::TRY:
+  case WebAssembly::TRY_S:
+  case WebAssembly::TRY_TABLE:
+  case WebAssembly::TRY_TABLE_S:
     return true;
   default:
     return false;
@@ -474,6 +496,14 @@ inline bool isCatch(unsigned Opc) {
   case WebAssembly::CATCH_LEGACY_S:
   case WebAssembly::CATCH_ALL_LEGACY:
   case WebAssembly::CATCH_ALL_LEGACY_S:
+  case WebAssembly::CATCH:
+  case WebAssembly::CATCH_S:
+  case WebAssembly::CATCH_REF:
+  case WebAssembly::CATCH_REF_S:
+  case WebAssembly::CATCH_ALL:
+  case WebAssembly::CATCH_ALL_S:
+  case WebAssembly::CATCH_ALL_REF:
+  case WebAssembly::CATCH_ALL_REF_S:
     return true;
   default:
     return false;
diff --git a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTypeUtilities.h b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTypeUtilities.h
index 063ee4dba9068e..4aca092e0e4c44 100644
--- a/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTypeUtilities.h
+++ b/llvm/lib/Target/WebAssembly/MCTargetDesc/WebAssemblyMCTypeUtilities.h
@@ -33,11 +33,15 @@ enum class BlockType : unsigned {
   Externref = unsigned(wasm::ValType::EXTERNREF),
   Funcref = unsigned(wasm::ValType::FUNCREF),
   Exnref = unsigned(wasm::ValType::EXNREF),
-  // Multivalue blocks (and other non-void blocks) are only emitted when the
-  // blocks will never be exited and are at the ends of functions (see
-  // WebAssemblyCFGStackify::fixEndsAtEndOfFunction). They also are never made
-  // to pop values off the stack, so the exact multivalue signature can always
-  // be inferred from the return type of the parent function in MCInstLower.
+  // Multivalue blocks are emitted in two cases:
+  // 1. When the blocks will never be exited and are at the ends of functions
+  //    (see WebAssemblyCFGStackify::fixEndsAtEndOfFunction). In this case the
+  //    exact multivalue signature can always be inferred from the return type
+  //    of the parent function.
+  // 2. (catch_ref ...) clause in try_table instruction. Currently all tags we
+  //    support (cpp_exception and c_longjmp) throws a single i32, so the
+  //    multivalue signature for this case will be (i32, exnref).
+  // The real multivalue siganture will be added in MCInstLower.
   Multivalue = 0xffff,
 };
 
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
index 6dd6145ed00573..14c0eaac17daaa 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
@@ -683,6 +683,17 @@ void WebAssemblyAsmPrinter::emitInstruction(const MachineInstr *MI) {
     // This is a compiler barrier that prevents instruction reordering during
     // backend compilation, and should not be emitted.
     break;
+  case WebAssembly::CATCH:
+  case WebAssembly::CATCH_S:
+  case WebAssembly::CATCH_REF:
+  case WebAssembly::CATCH_REF_S:
+  case WebAssembly::CATCH_ALL:
+  case WebAssembly::CATCH_ALL_S:
+  case WebAssembly::CATCH_ALL_REF:
+  case WebAssembly::CATCH_ALL_REF_S:
+    // These are pseudo instructions to represent catch clauses in try_table
+    // instruction to simulate block return values.
+    break;
   default: {
     WebAssemblyMCInstLower MCInstLowering(OutContext, *this);
     MCInst TmpInst;
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
index 3cccc57e629fd7..a5f73fabca3542 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp
@@ -9,9 +9,9 @@
 /// \file
 /// This file implements a CFG stacking pass.
 ///
-/// This pass inserts BLOCK, LOOP, and TRY markers to mark the start of scopes,
-/// since scope boundaries serve as the labels for WebAssembly's control
-/// transfers.
+/// This pass inserts BLOCK, LOOP, TRY, and TRY_TABLE markers to mark the start
+/// of scopes, since scope boundaries serve as the labels for WebAssembly's
+/// control transfers.
 ///
 /// This is sufficient to convert arbitrary CFGs into a form that works on
 /// WebAssembly, provided that all loops are single-entry.
@@ -21,6 +21,7 @@
 ///
 //===----------------------------------------------------------------------===//
 
+#include "MCTargetDesc/WebAssemblyMCTargetDesc.h"
 #include "Utils/WebAssemblyTypeUtilities.h"
 #include "WebAssembly.h"
 #include "WebAssemblyExceptionInfo.h"
@@ -29,6 +30,7 @@
 #include "WebAssemblySubtarget.h"
 #include "WebAssemblyUtilities.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/BinaryFormat/Wasm.h"
 #include "llvm/CodeGen/MachineDominators.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
@@ -74,6 +76,7 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
   void placeBlockMarker(MachineBasicBlock &MBB);
   void placeLoopMarker(MachineBasicBlock &MBB);
   void placeTryMarker(MachineBasicBlock &MBB);
+  void placeTryTableMarker(MachineBasicBlock &MBB);
 
   // Exception handling related functions
   bool fixCallUnwindMismatches(MachineFunction &MF);
@@ -97,11 +100,11 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
   void fixEndsAtEndOfFunction(MachineFunction &MF);
   void cleanupFunctionData(MachineFunction &MF);
 
-  // For each BLOCK|LOOP|TRY, the corresponding END_(BLOCK|LOOP|TRY) or DELEGATE
-  // (in case of TRY).
+  // For each BLOCK|LOOP|TRY|TRY_TABLE, the corresponding
+  // END_(BLOCK|LOOP|TRY|TRY_TABLE) or DELEGATE (in case of TRY).
   DenseMap<const MachineInstr *, MachineInstr *> BeginToEnd;
-  // For each END_(BLOCK|LOOP|TRY) or DELEGATE, the corresponding
-  // BLOCK|LOOP|TRY.
+  // For each END_(BLOCK|LOOP|TRY|TRY_TABLE) or DELEGATE, the corresponding
+  // BLOCK|LOOP|TRY|TRY_TABLE.
   DenseMap<const MachineInstr *, MachineInstr *> EndToBegin;
   // <TRY marker, EH pad> map
   DenseMap<const MachineInstr *, MachineBasicBlock *> TryToEHPad;
@@ -150,9 +153,10 @@ class WebAssemblyCFGStackify final : public MachineFunctionPass {
 } // end anonymous namespace
 
 char WebAssemblyCFGStackify::ID = 0;
-INITIALIZE_PASS(WebAssemblyCFGStackify, DEBUG_TYPE,
-                "Insert BLOCK/LOOP/TRY markers for WebAssembly scopes", false,
-                false)
+INITIALIZE_PASS(
+    WebAssemblyCFGStackify, DEBUG_TYPE,
+    "Insert BLOCK/LOOP/TRY/TRY_TABLE markers for WebAssembly scopes", false,
+    false)
 
 FunctionPass *llvm::createWebAssemblyCFGStackify() {
   return new WebAssemblyCFGStackify();
@@ -314,12 +318,13 @@ void WebAssemblyCFGStackify::placeBlockMarker(MachineBasicBlock &MBB) {
 #endif
     }
 
-    // If there is a previously placed BLOCK/TRY marker and its corresponding
-    // END marker is before the current BLOCK's END marker, that should be
-    // placed after this BLOCK. Otherwise it should be placed before this BLOCK
-    // marker.
+    // If there is a previously placed BLOCK/TRY/TRY_TABLE marker and its
+    // corresponding END marker is before the current BLOCK's END marker, that
+    // should be placed after this BLOCK. Otherwise it should be placed before
+    // this BLOCK marker.
     if (MI.getOpcode() == WebAssembly::BLOCK ||
-        MI.getOpcode() == WebAssembly::TRY) {
+        MI.getOpcode() == WebAssembly::TRY ||
+        MI.getOpcode() == WebAssembly::TRY_TABLE) {
       if (BeginToEnd[&MI]->getParent()->getNumber() <= MBB.getNumber())
         AfterSet.insert(&MI);
 #ifndef NDEBUG
@@ -329,10 +334,11 @@ void WebAssemblyCFGStackify::placeBlockMarker(MachineBasicBlock &MBB) {
     }
 
 #ifndef NDEBUG
-    // All END_(BLOCK|LOOP|TRY) markers should be before the BLOCK.
+    // All END_(BLOCK|LOOP|TRY|TRY_TABLE) markers should be before the BLOCK.
     if (MI.getOpcode() == WebAssembly::END_BLOCK ||
         MI.getOpcode() == WebAssembly::END_LOOP ||
-        MI.getOpcode() == WebAssembly::END_TRY)
+        MI.getOpcode() == WebAssembly::END_TRY ||
+        MI.getOpcode() == WebAssembly::END_TRY_TABLE)
       BeforeSet.insert(&MI);
 #endif
 
@@ -374,6 +380,11 @@ void WebAssemblyCFGStackify::placeBlockMarker(MachineBasicBlock &MBB) {
     // loop is above this block's header, the END_LOOP should be placed after
     // the END_BLOCK, because the loop contains this block. Otherwise the
     // END_LOOP should be placed before the END_BLOCK. The same for END_TRY.
+    //
+    // Note that while there can be existing END_TRYs, there can't be
+    // END_TRY_TABLEs; END_TRYs are placed when its corresponding EH pad is
+    // processed, so they are placed below MBB (EH pad) in placeTryMarker. But
+    // END_TRY_TABLE is placed like a END_BLOCK, so they can't be here already.
     if (MI.getOpcode() == WebAssembly::END_LOOP ||
         MI.getOpcode() == WebAssembly::END_TRY) {
       if (EndToBegin[&MI]->getParent()->getNumber() >= Header->getNumber())
@@ -657,7 +668,251 @@ void WebAssemblyCFGStackify::placeTryMarker(MachineBasicBlock &MBB) {
     updateScopeTops(Header, End);
 }
 
+void WebAssemblyCFGStackify::placeTryTableMarker(MachineBasicBlock &MBB) {
+  assert(MBB.isEHPad());
+  MachineFunction &MF = *MBB.getParent();
+  auto &MDT = getAnalysis<MachineDominatorTreeWrapperPass>().getDomTree();
+  const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
+  const auto &MLI = getAnalysis<MachineLoopInfoWrapperPass>().getLI();
+  const auto &WEI = getAnalysis<WebAssemblyExceptionInfo>();
+  SortRegionInfo SRI(MLI, WEI);
+  const auto &MFI = *MF.getInfo<WebAssemblyFunctionInfo>();
+
+  // Compute the nearest common dominator of all unwind predecessors
+  MachineBasicBlock *Header = nullptr;
+  int MBBNumber = MBB.getNumber();
+  for (auto *Pred : MBB.predecessors()) {
+    if (Pred->getNumber() < MBBNumber) {
+      Header = Header ? MDT.findNearestCommonDominator(Header, Pred) : Pred;
+      assert(!explicitlyBranchesTo(Pred, &MBB) &&
+             "Explicit branch to an EH pad!");
+    }
+  }
+  if (!Header)
+    return;
+
+  assert(&MBB != &MF.front() && "Header blocks shouldn't have predecessors");
+  MachineBasicBlock *LayoutPred = MBB.getPrevNode();
+
+  // If the nearest common dominator is inside a more deeply nested context,
+  // walk out to the nearest scope which isn't more deeply nested.
+  for (MachineFunction::iterator I(LayoutPred), E(Header); I != E; --I) {
+    if (MachineBasicBlock *ScopeTop = ScopeTops[I->getNumber()]) {
+      if (ScopeTop->getNumber() > Header->getNumber()) {
+        // Skip over an intervening scope.
+        I = std::next(ScopeTop->getIterator());
+      } else {
+        // We found a scope level at an appropriate depth.
+        Header = ScopeTop;
+        break;
+      }
+    }
+  }
+
+  // Decide where in Header to put the TRY_TABLE.
+
+  // Instructions that should go before the TRY_TABLE.
+  SmallPtrSet<const MachineInstr *, 4> BeforeSet;
+  // Instructions that should go after the TRY_TABLE.
+  SmallPtrSet<const MachineInstr *, 4> AfterSet;
+  for (const auto &MI : *Header) {
+    // If there is a previously placed LOOP marker and the bottom block of the
+    // loop is above MBB, it should be after the TRY_TABLE, because the loop is
+    // nested in this TRY_TABLE. Otherwise it should be before the TRY_TABLE.
+    if (MI.getOpcode() == WebAssembly::LOOP) {
+      auto *LoopBottom = BeginToEnd[&MI]->getParent()->getPrevNode();
+      if (MBB.getNumber() > LoopBottom->getNumber())
+        AfterSet.insert(&MI);
+#ifndef NDEBUG
+      else
+        BeforeSet.insert(&MI);
+#endif
+    }
+
+    // All previously inserted BLOCK/TRY_TABLE markers should be after the
+    // TRY_TABLE because they are all nested blocks/try_tables.
+    if (MI.getOpcode() == WebAssembly::BLOCK ||
+        MI.getOpcode() == WebAssembly::TRY_TABLE)
+      AfterSet.insert(&MI);
+
+#ifndef NDEBUG
+    // All END_(BLOCK/LOOP/TRY_TABLE) markers should be before the TRY_TABLE.
+    if (MI.getOpcode() == WebAssembly::END_BLOCK ||
+        MI.getOpcode() == WebAssembly::END_LOOP ||
+        MI.getOpcode() == WebAssembly::END_TRY_TABLE)
+      BeforeSet.insert(&MI);
+#endif
+
+    // Terminators should go after the TRY_TABLE.
+    if (MI.isTerminator())
+      AfterSet.insert(&MI);
+  }
+
+  // If Header unwinds to MBB (= Header contains 'invoke'), the try_table block
+  // should contain the call within it. So the call should go after the
+  // TRY_TABLE. The exception is when the header's terminator is a rethrow
+  // instruction, in which case that instruction, not a call instruction before
+  // it, is gonna throw.
+  MachineInstr *ThrowingCall = nullptr;
+  if (MBB.isPredecessor(Header)) {
+    auto TermPos = Header->getFirstTerminator();
+    if (TermPos == Header->end() ||
+        TermPos->getOpcode() != WebAssembly::RETHROW) {
+      for (auto &MI : reverse(*Header)) {
+        if (MI.isCall()) {
+          AfterSet.insert(&MI);
+          ThrowingCall = &MI;
+          // Possibly throwing calls are usually wrapped by EH_LABEL
+          // instructions. We don't want to split them and the call.
+          if (MI.getIterator() != Header->begin() &&
+              std::prev(MI.getIterator())->isEHLabel()) {
+            AfterSet.insert(&*std::prev(MI.getIterator()));
+            ThrowingCall = &*std::prev(MI.getIterator());
+          }
+          break;
+        }
+      }
+    }
+  }
+
+  // Local expression tree should go after the TRY_TABLE.
+  // For BLOCK placement, we start the search from the previous instruction of a
+  // BB's terminator, but in TRY_TABLE's case, we should start from the previous
+  // instruction of a call that can throw, or a EH_LABEL that precedes the call,
+  // because the return values of the call's previous instructions can be
+  // stackified and consumed by the throwing call.
+  auto SearchStartPt = ThrowingCall ? MachineBasicBlock::iterator(ThrowingCall)
+                                    : Header->getFirstTerminator();
+  for (auto I = SearchStartPt, E = Header->begin(); I != E; --I) {
+    if (std::prev(I)->isDebugInstr() || std::prev(I)->isPosition())
+      continue;
+    if (WebAssembly::isChild(*std::prev(I), MFI))
+      AfterSet.insert(&*std::prev(I));
+    else
+      break;
+  }
+
+  // Add the TRY_TABLE and a BLOCK for the catch destination. We currently
+  // generate only one CATCH clause for a TRY_TABLE, so we need one BLOCK for
+  // its destination.
+  //
+  // Header:
+  //   block
+  //     try_table (catch ... $MBB)
+  //       ...
+  //
+  // MBB:
+  //     end_try_table
+  //   end_block                ...
[truncated]

@@ -147,6 +178,8 @@ let isTerminator = 1, hasSideEffects = 1, isBarrier = 1, hasCtrlDep = 1,
// usage gets low enough.

// Rethrowing an exception: rethrow
// The new exnref proposal also uses this instruction as an interim pseudo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just because we want to be able to support both the old and new proposals for now, or is there a more fundamental reason why it's better to use rethrow early in the pipeline and then convert it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it happens to be more convenient too. Until we match instructions (llvm.wasm.rethrow intrinsic or cleanupret) that later becomes throw_ref with the corresponding EH pad and its catch_*** instruction, which happens in LateEHPrepare, we need a way to express "rethrow". If we really want to avoid the use of the pseudo RETHROW, we have to do this matching in isel, which I think will be very inconvenient.

Copy link
Member Author

@aheejin aheejin Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be strict maybe we can make the old rethrow RETHROW_LEGACY or something too and add a new pseudo RETHROW, but I didn't feel it very necessary given that this is not very confusing. (In CATCH/CATCH_ALL's case it was confusing because in CFGStackify or MCInstLower I had to check WebAssembly::WasmEnableExnref many times to check whether the current instruction was the legacy CATCH or the new pseudo CATCH)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see; before LateEHPrepare, there is no exnref that comes out of the catch, so it doesn't make sense to have a throw_ref either. So instead we have rethrow, which has semantics that are meaningful on their own in this phase of compilation even though they don't correspond to what we need at the end.
And yeah, given that rethrow is a meaningful concept independent of the legacy spec, I don't think we need to try to keep it labeled as legacy permanently.

Catch->eraseFromParent();

for (auto *Rethrow : Rethrows) {
auto InsertPos = Rethrow->getIterator()++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of post-increment on the getter seems a little bit confusing to me (since it seems like it's being used on an rvalue rather than on a variable that is still live after the statement). Is this equivalent to getIterator()->getNextNode() like you used above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNextNode does not work here because it is very likely that there is no next node, because RETHROW is a terminator. But I can't just do Rethrow->getParent()->end() because there can be other non-functional instructions like DBG_VALUE after that.

Switched to std::next(Rethrow->getIterator()), given that this seems safer and also frequently used in the codebase.

// Local expression tree should go after the TRY_TABLE.
// For BLOCK placement, we start the search from the previous instruction of a
// BB's terminator, but in TRY_TABLE's case, we should start from the previous
// instruction of a call that can throw, or a EH_LABEL that precedes the call,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that throwing calls were also terminators. Is this not the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTerminator property checked by MachineBasicBlock::getFirstTerminator

MachineBasicBlock::iterator MachineBasicBlock::getFirstTerminator() {
iterator B = begin(), E = end(), I = E;
while (I != B && ((--I)->isTerminator() || I->isDebugInstr()))
; /*noop */
while (I != E && !I->isTerminator())
++I;
return I;
}

is this:

bit isTerminator = false; // Is this part of the terminator for a basic block?

And this is set only on branches, returns, and return calls. So calls don't have that property set.

This is basically copied form the existing placeTryBlock:

// Local expression tree should go after the TRY.
// For BLOCK placement, we start the search from the previous instruction of a
// BB's terminator, but in TRY's case, we should start from the previous
// instruction of a call that can throw, or a EH_LABEL that precedes the call,
// because the return values of the call's previous instructions can be
// stackified and consumed by the throwing call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I guess at this point in the pipeline we don't have a distinction between call and invoke the way LLVM IR does.

@aheejin aheejin merged commit 6bbf7f0 into llvm:main Sep 11, 2024
8 checks passed
@aheejin aheejin deleted the new_eh_basic branch September 11, 2024 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants