Add support for flag output operand "=@cc" for SystemZ. #125970

anoopkg6 · 2025-02-06T00:22:51Z

Add support for flag output operand "=@cc" for SystemZ and optimizing conditional branch for 14 possible combinations of CC mask.

… conditional branch for 14 possible combinations of CC mask.

llvmbot · 2025-02-06T00:23:26Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-clang

Author: None (anoopkg6)

Changes

Add support for flag output operand "=@cc" for SystemZ and optimizing conditional branch for 14 possible combinations of CC mask.

Patch is 616.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125970.diff

21 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.cpp (+11)
(modified) clang/lib/Basic/Targets/SystemZ.h (+5)
(modified) clang/lib/CodeGen/CGStmt.cpp (+8-2)
(added) clang/test/CodeGen/inline-asm-systemz-flag-output.c (+149)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+3)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+61-9)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+4)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+598-2)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+14)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand.ll (+500)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_eq_noteq.ll (+939)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_not.ll (+779)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed.ll (+2427)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_eq_noteq.ll (+5248)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_not.ll (+2543)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor.ll (+1047)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_eq_noteq.ll (+854)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_not.ll (+806)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor.ll (+784)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_eq_noteq.ll (+1083)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_not.ll (+778)

diff --git a/clang/lib/Basic/Targets/SystemZ.cpp b/clang/lib/Basic/Targets/SystemZ.cpp
index 06f08db2eadd475..49f88b45220d0c4 100644
--- a/clang/lib/Basic/Targets/SystemZ.cpp
+++ b/clang/lib/Basic/Targets/SystemZ.cpp
@@ -90,6 +90,14 @@ bool SystemZTargetInfo::validateAsmConstraint(
   case 'T': // Likewise, plus an index
     Info.setAllowsMemory();
     return true;
+  case '@':
+    // CC condition changes.
+    if (strlen(Name) >= 3 && *(Name + 1) == 'c' && *(Name + 2) == 'c') {
+      Name += 2;
+      Info.setAllowsRegister();
+      return true;
+    }
+    return false;
   }
 }
 
@@ -150,6 +158,9 @@ unsigned SystemZTargetInfo::getMinGlobalAlign(uint64_t Size,
 
 void SystemZTargetInfo::getTargetDefines(const LangOptions &Opts,
                                          MacroBuilder &Builder) const {
+  // Inline assembly supports SystemZ flag outputs.
+  Builder.defineMacro("__GCC_ASM_FLAG_OUTPUTS__");
+
   Builder.defineMacro("__s390__");
   Builder.defineMacro("__s390x__");
   Builder.defineMacro("__zarch__");
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index ef9a07033a6e4ff..a6909ababdec001 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -118,6 +118,11 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                              TargetInfo::ConstraintInfo &info) const override;
 
   std::string convertConstraint(const char *&Constraint) const override {
+    if (strncmp(Constraint, "@cc", 3) == 0) {
+      std::string Converted = "{" + std::string(Constraint, 3) + "}";
+      Constraint += 3;
+      return Converted;
+    }
     switch (Constraint[0]) {
     case 'p': // Keep 'p' constraint.
       return std::string("p");
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 41dc91c578c800a..27f7bb652895839 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -2563,9 +2563,15 @@ EmitAsmStores(CodeGenFunction &CGF, const AsmStmt &S,
     if ((i < ResultRegIsFlagReg.size()) && ResultRegIsFlagReg[i]) {
       // Target must guarantee the Value `Tmp` here is lowered to a boolean
       // value.
-      llvm::Constant *Two = llvm::ConstantInt::get(Tmp->getType(), 2);
+      unsigned CCUpperBound = 2;
+      if (CGF.getTarget().getTriple().getArch() == llvm::Triple::systemz) {
+        // On this target CC value can be in range [0, 3].
+        CCUpperBound = 4;
+      }
+      llvm::Constant *CCUpperBoundConst =
+          llvm::ConstantInt::get(Tmp->getType(), CCUpperBound);
       llvm::Value *IsBooleanValue =
-          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, Two);
+          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, CCUpperBoundConst);
       llvm::Function *FnAssume = CGM.getIntrinsic(llvm::Intrinsic::assume);
       Builder.CreateCall(FnAssume, IsBooleanValue);
     }
diff --git a/clang/test/CodeGen/inline-asm-systemz-flag-output.c b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
new file mode 100644
index 000000000000000..ab90e031df1f2b8
--- /dev/null
+++ b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
@@ -0,0 +1,149 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple s390x-linux -emit-llvm -o - %s | FileCheck %s
+// CHECK-LABEL: define dso_local signext i32 @foo_012(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2:[0-9]+]], !srcloc [[META2:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 2
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_012(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 2 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_013(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META3:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_013(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_023(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META4:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_023(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 2 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_123(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META5:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 1
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_123(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 1 || cc == 2 || cc == 3 ? 42 : 0;
+}
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index e0b638201a04740..cb136fe2f446b43 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5071,6 +5071,9 @@ class TargetLowering : public TargetLoweringBase {
                                             std::vector<SDValue> &Ops,
                                             SelectionDAG &DAG) const;
 
+  // Lower switch statement for flag output operand with SRL/IPM Sequence.
+  virtual bool canLowerSRL_IPM_Switch(SDValue Cond) const;
+
   // Lower custom output constraints. If invalid, return SDValue().
   virtual SDValue LowerAsmOutputForConstraint(SDValue &Chain, SDValue &Glue,
                                               const SDLoc &DL,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 3b046aa25f54440..a32787bc882f175 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -2831,8 +2831,37 @@ void SelectionDAGBuilder::visitBr(const BranchInst &I) {
       Opcode = Instruction::And;
     else if (match(BOp, m_LogicalOr(m_Value(BOp0), m_Value(BOp1))))
       Opcode = Instruction::Or;
-
-    if (Opcode &&
+    auto &TLI = DAG.getTargetLoweringInfo();
+    bool BrSrlIPM = FuncInfo.MF->getTarget().getTargetTriple().getArch() ==
+                    Triple::ArchType::systemz;
+    // For Flag output operands SRL/IPM sequence, we want to avoid
+    // creating switch case, as it creates Basic Block and inhibits
+    // optimization in DAGCombiner for flag output operands.
+    const auto checkSRLIPM = [&TLI](const SDValue &Op) {
+      if (!Op.getNumOperands())
+        return false;
+      SDValue OpVal = Op.getOperand(0);
+      SDNode *N = OpVal.getNode();
+      if (N && N->getOpcode() == ISD::SRL)
+        return TLI.canLowerSRL_IPM_Switch(OpVal);
+      else if (N && OpVal.getNumOperands() &&
+               (N->getOpcode() == ISD::AND || N->getOpcode() == ISD::OR)) {
+        SDValue OpVal1 = OpVal.getOperand(0);
+        SDNode *N1 = OpVal1.getNode();
+        if (N1 && N1->getOpcode() == ISD::SRL)
+          return TLI.canLowerSRL_IPM_Switch(OpVal1);
+      }
+      return false;
+    };
+    if (BrSrlIPM) {
+      if (NodeMap.count(BOp0) && NodeMap[BOp0].getNode()) {
+        BrSrlIPM &= checkSRLIPM(getValue(BOp0));
+        if (NodeMap.count(BOp1) && NodeMap[BOp1].getNode())
+          BrSrlIPM &= checkSRLIPM(getValue(BOp1));
+      } else
+        BrSrlIPM = false;
+    }
+    if (Opcode && !BrSrlIPM &&
         !(match(BOp0, m_ExtractElt(m_Value(Vec), m_Value())) &&
           match(BOp1, m_ExtractElt(m_Specific(Vec), m_Value()))) &&
         !shouldKeepJumpConditionsTogether(
@@ -12043,18 +12072,41 @@ void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,
       const APInt &SmallValue = Small.Low->getValue();
       const APInt &BigValue = Big.Low->getValue();
 
+      // Creating switch cases optimizing tranformation inhibits DAGCombiner
+      // for SystemZ for flag output operands. DAGCobiner compute cumulative
+      // CCMask for flag output operands SRL/IPM sequence, we want to avoid
+      // creating switch case, as it creates Basic Block and inhibits
+      // optimization in DAGCombiner for flag output operands.
+      // cases like (CC == 0) || (CC == 2) || (CC == 3), or
+      // (CC == 0) || (CC == 1) ^ (CC == 3), there could potentially be
+      // more cases like this.
+      const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+      bool IsSrlIPM = false;
+      if (NodeMap.count(Cond) && NodeMap[Cond].getNode())
+        IsSrlIPM = CurMF->getTarget().getTargetTriple().getArch() ==
+                       Triple::ArchType::systemz &&
+                   TLI.canLowerSRL_IPM_Switch(getValue(Cond));
       // Check that there is only one bit different.
       APInt CommonBit = BigValue ^ SmallValue;
-      if (CommonBit.isPowerOf2()) {
+      if (CommonBit.isPowerOf2() || IsSrlIPM) {
         SDValue CondLHS = getValue(Cond);
         EVT VT = CondLHS.getValueType();
         SDLoc DL = getCurSDLoc();
-
-        SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
-                                 DAG.getConstant(CommonBit, DL, VT));
-        SDValue Cond = DAG.getSetCC(
-            DL, MVT::i1, Or, DAG.getConstant(BigValue | SmallValue, DL, VT),
-            ISD::SETEQ);
+        SDValue Cond;
+
+        if (CommonBit.isPowerOf2()) {
+          SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
+                                   DAG.getConstant(CommonBit, DL, VT));
+          Cond = DAG.getSetCC(DL, MVT::i1, Or,
+                              DAG.getConstant(BigValue | SmallValue, DL, VT),
+                              ISD::SETEQ);
+        } else if (IsSrlIPM && BigValue == 3 && SmallValue == 0) {
+          SDValue SetCC =
+              DAG.getSetCC(DL, MVT::i32, CondLHS,
+                           DAG.getConstant(SmallValue, DL, VT), ISD::SETEQ);
+          Cond = DAG.getSetCC(DL, MVT::i32, SetCC,
+                              DAG.getConstant(BigValue, DL, VT), ISD::SETEQ);
+        }
 
         // Update successor info.
         // Both Small and Big will jump to Small.BB, so we sum up the
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 8287565336b54d1..3d48adac509cb9e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -5563,6 +5563,10 @@ const char *TargetLowering::LowerXConstraint(EVT ConstraintVT) const {
   return nullptr;
 }
 
+bool TargetLowering::canLowerSRL_IPM_Switch(SDValue Cond) const {
+  return false;
+}
+
 SDValue TargetLowering::LowerAsmOutputForConstraint(
     SDValue &Chain, SDValue &Glue, const SDLoc &DL,
     const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 3999b54de81b657..259da48a3b22321 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1207,6 +1207,9 @@ SystemZTargetLowering::getConstraintType(StringRef Constraint) const {
     default:
       break;
     }
+  } else if (Constraint.size() == 5 && Constraint.starts_with("{")) {
+    if (StringRef("{@cc}").compare(Constraint) == 0)
+      return C_Other;
   }
   return TargetLowering::getConstraintType(Constraint);
 }
@@ -1389,6 +1392,10 @@ SystemZTargetLowering::getRegForInlineAsmConstraint(
       return parseRegisterNumber(Constraint, &SystemZ::VR128BitRegClass,
                                  SystemZMC::VR128Regs, 32);
     }
+    if (Constraint[1] == '@') {
+      if (StringRef("{@cc}").compare(Constraint) == 0)
+        return std::make_pair(0u, &SystemZ::GR32BitRegClass);
+    }
   }
   return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);
 }
@@ -1421,6 +1428,35 @@ Register SystemZTargetLowering::getExceptionSelectorRegister(
   return Subtarget.isTargetXPLINK64() ? SystemZ::R2D : SystemZ::R7D;
 }
 
+// Lower @cc targets via setcc.
+SDValue SystemZTargetLowering::LowerAsmOutputForConstraint(
+    SDValue &Chain, SDValue &Glue, const SDLoc &DL,
+    const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
+  if (StringRef("{@cc}").compare(OpInfo.ConstraintCode) != 0)
+    return SDValue();
+
+  // Check that return type is valid.
+  if (OpInfo.ConstraintVT.isVector() || !OpInfo.ConstraintVT.isInteger() ||
+      OpInfo.ConstraintVT.getSizeInBits() < 8)
+    report_fatal_error("Glue output operand is of invalid type");
+
+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  MRI.addLiveIn(SystemZ::CC);
+
+  if (Glue.getNode()) {
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32, Glue);
+    Chain = Glue.getValue(1);
+  } else
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32);
+
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, DL, MVT::i32, Glue);
+  SDValue CC = DAG.getNode(ISD::SRL, DL, MVT::i32, IPM,
+                           DAG.getConstant(SystemZ::IPM_CC, DL, MVT::i32));
+
+  return CC;
+}
+
 void SystemZTargetLowering::LowerAsmOperandForConstraint(
     SDValue Op, StringRef Constraint, std::vector<SDValue> &Ops,
     SelectionDAG &DAG) const {
@@ -2485,6 +2521,21 @@ static unsigned CCMaskForCondCode(ISD::CondCode CC) {
 #undef CONV
 }
 
+static unsigned CCMaskForSystemZCCVal(unsigned CC) {
+  switch (CC) {
+  default:
+    llvm_unreachable("invalid integer condition!");
+  case 0:
+    return SystemZ::CCMASK_CMP_EQ;
+  case 1:
+    return SystemZ::CCMASK_CMP_LT;
+  case 2:
+    return SystemZ::CCMASK_CMP_GT;
+  case 3:
+    return SystemZ::CCMASK_CMP_UO;
+  }
+}
+
 // If C can be converted to a comparison against zero, ...
[truncated]

llvmbot · 2025-02-06T00:23:26Z

@llvm/pr-subscribers-clang-codegen

Author: None (anoopkg6)

Changes

Add support for flag output operand "=@cc" for SystemZ and optimizing conditional branch for 14 possible combinations of CC mask.

Patch is 616.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125970.diff

21 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.cpp (+11)
(modified) clang/lib/Basic/Targets/SystemZ.h (+5)
(modified) clang/lib/CodeGen/CGStmt.cpp (+8-2)
(added) clang/test/CodeGen/inline-asm-systemz-flag-output.c (+149)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+3)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+61-9)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+4)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+598-2)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+14)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand.ll (+500)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_eq_noteq.ll (+939)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_not.ll (+779)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed.ll (+2427)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_eq_noteq.ll (+5248)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_not.ll (+2543)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor.ll (+1047)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_eq_noteq.ll (+854)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_not.ll (+806)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor.ll (+784)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_eq_noteq.ll (+1083)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_not.ll (+778)

diff --git a/clang/lib/Basic/Targets/SystemZ.cpp b/clang/lib/Basic/Targets/SystemZ.cpp
index 06f08db2eadd475..49f88b45220d0c4 100644
--- a/clang/lib/Basic/Targets/SystemZ.cpp
+++ b/clang/lib/Basic/Targets/SystemZ.cpp
@@ -90,6 +90,14 @@ bool SystemZTargetInfo::validateAsmConstraint(
   case 'T': // Likewise, plus an index
     Info.setAllowsMemory();
     return true;
+  case '@':
+    // CC condition changes.
+    if (strlen(Name) >= 3 && *(Name + 1) == 'c' && *(Name + 2) == 'c') {
+      Name += 2;
+      Info.setAllowsRegister();
+      return true;
+    }
+    return false;
   }
 }
 
@@ -150,6 +158,9 @@ unsigned SystemZTargetInfo::getMinGlobalAlign(uint64_t Size,
 
 void SystemZTargetInfo::getTargetDefines(const LangOptions &Opts,
                                          MacroBuilder &Builder) const {
+  // Inline assembly supports SystemZ flag outputs.
+  Builder.defineMacro("__GCC_ASM_FLAG_OUTPUTS__");
+
   Builder.defineMacro("__s390__");
   Builder.defineMacro("__s390x__");
   Builder.defineMacro("__zarch__");
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index ef9a07033a6e4ff..a6909ababdec001 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -118,6 +118,11 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                              TargetInfo::ConstraintInfo &info) const override;
 
   std::string convertConstraint(const char *&Constraint) const override {
+    if (strncmp(Constraint, "@cc", 3) == 0) {
+      std::string Converted = "{" + std::string(Constraint, 3) + "}";
+      Constraint += 3;
+      return Converted;
+    }
     switch (Constraint[0]) {
     case 'p': // Keep 'p' constraint.
       return std::string("p");
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 41dc91c578c800a..27f7bb652895839 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -2563,9 +2563,15 @@ EmitAsmStores(CodeGenFunction &CGF, const AsmStmt &S,
     if ((i < ResultRegIsFlagReg.size()) && ResultRegIsFlagReg[i]) {
       // Target must guarantee the Value `Tmp` here is lowered to a boolean
       // value.
-      llvm::Constant *Two = llvm::ConstantInt::get(Tmp->getType(), 2);
+      unsigned CCUpperBound = 2;
+      if (CGF.getTarget().getTriple().getArch() == llvm::Triple::systemz) {
+        // On this target CC value can be in range [0, 3].
+        CCUpperBound = 4;
+      }
+      llvm::Constant *CCUpperBoundConst =
+          llvm::ConstantInt::get(Tmp->getType(), CCUpperBound);
       llvm::Value *IsBooleanValue =
-          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, Two);
+          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, CCUpperBoundConst);
       llvm::Function *FnAssume = CGM.getIntrinsic(llvm::Intrinsic::assume);
       Builder.CreateCall(FnAssume, IsBooleanValue);
     }
diff --git a/clang/test/CodeGen/inline-asm-systemz-flag-output.c b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
new file mode 100644
index 000000000000000..ab90e031df1f2b8
--- /dev/null
+++ b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
@@ -0,0 +1,149 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple s390x-linux -emit-llvm -o - %s | FileCheck %s
+// CHECK-LABEL: define dso_local signext i32 @foo_012(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2:[0-9]+]], !srcloc [[META2:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 2
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_012(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 2 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_013(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META3:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_013(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_023(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META4:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_023(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 2 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_123(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META5:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 1
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_123(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 1 || cc == 2 || cc == 3 ? 42 : 0;
+}
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index e0b638201a04740..cb136fe2f446b43 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5071,6 +5071,9 @@ class TargetLowering : public TargetLoweringBase {
                                             std::vector<SDValue> &Ops,
                                             SelectionDAG &DAG) const;
 
+  // Lower switch statement for flag output operand with SRL/IPM Sequence.
+  virtual bool canLowerSRL_IPM_Switch(SDValue Cond) const;
+
   // Lower custom output constraints. If invalid, return SDValue().
   virtual SDValue LowerAsmOutputForConstraint(SDValue &Chain, SDValue &Glue,
                                               const SDLoc &DL,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 3b046aa25f54440..a32787bc882f175 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -2831,8 +2831,37 @@ void SelectionDAGBuilder::visitBr(const BranchInst &I) {
       Opcode = Instruction::And;
     else if (match(BOp, m_LogicalOr(m_Value(BOp0), m_Value(BOp1))))
       Opcode = Instruction::Or;
-
-    if (Opcode &&
+    auto &TLI = DAG.getTargetLoweringInfo();
+    bool BrSrlIPM = FuncInfo.MF->getTarget().getTargetTriple().getArch() ==
+                    Triple::ArchType::systemz;
+    // For Flag output operands SRL/IPM sequence, we want to avoid
+    // creating switch case, as it creates Basic Block and inhibits
+    // optimization in DAGCombiner for flag output operands.
+    const auto checkSRLIPM = [&TLI](const SDValue &Op) {
+      if (!Op.getNumOperands())
+        return false;
+      SDValue OpVal = Op.getOperand(0);
+      SDNode *N = OpVal.getNode();
+      if (N && N->getOpcode() == ISD::SRL)
+        return TLI.canLowerSRL_IPM_Switch(OpVal);
+      else if (N && OpVal.getNumOperands() &&
+               (N->getOpcode() == ISD::AND || N->getOpcode() == ISD::OR)) {
+        SDValue OpVal1 = OpVal.getOperand(0);
+        SDNode *N1 = OpVal1.getNode();
+        if (N1 && N1->getOpcode() == ISD::SRL)
+          return TLI.canLowerSRL_IPM_Switch(OpVal1);
+      }
+      return false;
+    };
+    if (BrSrlIPM) {
+      if (NodeMap.count(BOp0) && NodeMap[BOp0].getNode()) {
+        BrSrlIPM &= checkSRLIPM(getValue(BOp0));
+        if (NodeMap.count(BOp1) && NodeMap[BOp1].getNode())
+          BrSrlIPM &= checkSRLIPM(getValue(BOp1));
+      } else
+        BrSrlIPM = false;
+    }
+    if (Opcode && !BrSrlIPM &&
         !(match(BOp0, m_ExtractElt(m_Value(Vec), m_Value())) &&
           match(BOp1, m_ExtractElt(m_Specific(Vec), m_Value()))) &&
         !shouldKeepJumpConditionsTogether(
@@ -12043,18 +12072,41 @@ void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,
       const APInt &SmallValue = Small.Low->getValue();
       const APInt &BigValue = Big.Low->getValue();
 
+      // Creating switch cases optimizing tranformation inhibits DAGCombiner
+      // for SystemZ for flag output operands. DAGCobiner compute cumulative
+      // CCMask for flag output operands SRL/IPM sequence, we want to avoid
+      // creating switch case, as it creates Basic Block and inhibits
+      // optimization in DAGCombiner for flag output operands.
+      // cases like (CC == 0) || (CC == 2) || (CC == 3), or
+      // (CC == 0) || (CC == 1) ^ (CC == 3), there could potentially be
+      // more cases like this.
+      const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+      bool IsSrlIPM = false;
+      if (NodeMap.count(Cond) && NodeMap[Cond].getNode())
+        IsSrlIPM = CurMF->getTarget().getTargetTriple().getArch() ==
+                       Triple::ArchType::systemz &&
+                   TLI.canLowerSRL_IPM_Switch(getValue(Cond));
       // Check that there is only one bit different.
       APInt CommonBit = BigValue ^ SmallValue;
-      if (CommonBit.isPowerOf2()) {
+      if (CommonBit.isPowerOf2() || IsSrlIPM) {
         SDValue CondLHS = getValue(Cond);
         EVT VT = CondLHS.getValueType();
         SDLoc DL = getCurSDLoc();
-
-        SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
-                                 DAG.getConstant(CommonBit, DL, VT));
-        SDValue Cond = DAG.getSetCC(
-            DL, MVT::i1, Or, DAG.getConstant(BigValue | SmallValue, DL, VT),
-            ISD::SETEQ);
+        SDValue Cond;
+
+        if (CommonBit.isPowerOf2()) {
+          SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
+                                   DAG.getConstant(CommonBit, DL, VT));
+          Cond = DAG.getSetCC(DL, MVT::i1, Or,
+                              DAG.getConstant(BigValue | SmallValue, DL, VT),
+                              ISD::SETEQ);
+        } else if (IsSrlIPM && BigValue == 3 && SmallValue == 0) {
+          SDValue SetCC =
+              DAG.getSetCC(DL, MVT::i32, CondLHS,
+                           DAG.getConstant(SmallValue, DL, VT), ISD::SETEQ);
+          Cond = DAG.getSetCC(DL, MVT::i32, SetCC,
+                              DAG.getConstant(BigValue, DL, VT), ISD::SETEQ);
+        }
 
         // Update successor info.
         // Both Small and Big will jump to Small.BB, so we sum up the
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 8287565336b54d1..3d48adac509cb9e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -5563,6 +5563,10 @@ const char *TargetLowering::LowerXConstraint(EVT ConstraintVT) const {
   return nullptr;
 }
 
+bool TargetLowering::canLowerSRL_IPM_Switch(SDValue Cond) const {
+  return false;
+}
+
 SDValue TargetLowering::LowerAsmOutputForConstraint(
     SDValue &Chain, SDValue &Glue, const SDLoc &DL,
     const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 3999b54de81b657..259da48a3b22321 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1207,6 +1207,9 @@ SystemZTargetLowering::getConstraintType(StringRef Constraint) const {
     default:
       break;
     }
+  } else if (Constraint.size() == 5 && Constraint.starts_with("{")) {
+    if (StringRef("{@cc}").compare(Constraint) == 0)
+      return C_Other;
   }
   return TargetLowering::getConstraintType(Constraint);
 }
@@ -1389,6 +1392,10 @@ SystemZTargetLowering::getRegForInlineAsmConstraint(
       return parseRegisterNumber(Constraint, &SystemZ::VR128BitRegClass,
                                  SystemZMC::VR128Regs, 32);
     }
+    if (Constraint[1] == '@') {
+      if (StringRef("{@cc}").compare(Constraint) == 0)
+        return std::make_pair(0u, &SystemZ::GR32BitRegClass);
+    }
   }
   return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);
 }
@@ -1421,6 +1428,35 @@ Register SystemZTargetLowering::getExceptionSelectorRegister(
   return Subtarget.isTargetXPLINK64() ? SystemZ::R2D : SystemZ::R7D;
 }
 
+// Lower @cc targets via setcc.
+SDValue SystemZTargetLowering::LowerAsmOutputForConstraint(
+    SDValue &Chain, SDValue &Glue, const SDLoc &DL,
+    const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
+  if (StringRef("{@cc}").compare(OpInfo.ConstraintCode) != 0)
+    return SDValue();
+
+  // Check that return type is valid.
+  if (OpInfo.ConstraintVT.isVector() || !OpInfo.ConstraintVT.isInteger() ||
+      OpInfo.ConstraintVT.getSizeInBits() < 8)
+    report_fatal_error("Glue output operand is of invalid type");
+
+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  MRI.addLiveIn(SystemZ::CC);
+
+  if (Glue.getNode()) {
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32, Glue);
+    Chain = Glue.getValue(1);
+  } else
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32);
+
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, DL, MVT::i32, Glue);
+  SDValue CC = DAG.getNode(ISD::SRL, DL, MVT::i32, IPM,
+                           DAG.getConstant(SystemZ::IPM_CC, DL, MVT::i32));
+
+  return CC;
+}
+
 void SystemZTargetLowering::LowerAsmOperandForConstraint(
     SDValue Op, StringRef Constraint, std::vector<SDValue> &Ops,
     SelectionDAG &DAG) const {
@@ -2485,6 +2521,21 @@ static unsigned CCMaskForCondCode(ISD::CondCode CC) {
 #undef CONV
 }
 
+static unsigned CCMaskForSystemZCCVal(unsigned CC) {
+  switch (CC) {
+  default:
+    llvm_unreachable("invalid integer condition!");
+  case 0:
+    return SystemZ::CCMASK_CMP_EQ;
+  case 1:
+    return SystemZ::CCMASK_CMP_LT;
+  case 2:
+    return SystemZ::CCMASK_CMP_GT;
+  case 3:
+    return SystemZ::CCMASK_CMP_UO;
+  }
+}
+
 // If C can be converted to a comparison against zero, ...
[truncated]

llvmbot · 2025-02-06T00:23:27Z

@llvm/pr-subscribers-llvm-selectiondag

Author: None (anoopkg6)

Changes

Add support for flag output operand "=@cc" for SystemZ and optimizing conditional branch for 14 possible combinations of CC mask.

Patch is 616.60 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125970.diff

21 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.cpp (+11)
(modified) clang/lib/Basic/Targets/SystemZ.h (+5)
(modified) clang/lib/CodeGen/CGStmt.cpp (+8-2)
(added) clang/test/CodeGen/inline-asm-systemz-flag-output.c (+149)
(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+3)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+61-9)
(modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+4)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+598-2)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+14)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand.ll (+500)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_eq_noteq.ll (+939)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccand_not.ll (+779)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed.ll (+2427)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_eq_noteq.ll (+5248)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccmixed_not.ll (+2543)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor.ll (+1047)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_eq_noteq.ll (+854)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccor_not.ll (+806)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor.ll (+784)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_eq_noteq.ll (+1083)
(added) llvm/test/CodeGen/SystemZ/flag_output_operand_ccxor_not.ll (+778)

diff --git a/clang/lib/Basic/Targets/SystemZ.cpp b/clang/lib/Basic/Targets/SystemZ.cpp
index 06f08db2eadd475..49f88b45220d0c4 100644
--- a/clang/lib/Basic/Targets/SystemZ.cpp
+++ b/clang/lib/Basic/Targets/SystemZ.cpp
@@ -90,6 +90,14 @@ bool SystemZTargetInfo::validateAsmConstraint(
   case 'T': // Likewise, plus an index
     Info.setAllowsMemory();
     return true;
+  case '@':
+    // CC condition changes.
+    if (strlen(Name) >= 3 && *(Name + 1) == 'c' && *(Name + 2) == 'c') {
+      Name += 2;
+      Info.setAllowsRegister();
+      return true;
+    }
+    return false;
   }
 }
 
@@ -150,6 +158,9 @@ unsigned SystemZTargetInfo::getMinGlobalAlign(uint64_t Size,
 
 void SystemZTargetInfo::getTargetDefines(const LangOptions &Opts,
                                          MacroBuilder &Builder) const {
+  // Inline assembly supports SystemZ flag outputs.
+  Builder.defineMacro("__GCC_ASM_FLAG_OUTPUTS__");
+
   Builder.defineMacro("__s390__");
   Builder.defineMacro("__s390x__");
   Builder.defineMacro("__zarch__");
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index ef9a07033a6e4ff..a6909ababdec001 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -118,6 +118,11 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                              TargetInfo::ConstraintInfo &info) const override;
 
   std::string convertConstraint(const char *&Constraint) const override {
+    if (strncmp(Constraint, "@cc", 3) == 0) {
+      std::string Converted = "{" + std::string(Constraint, 3) + "}";
+      Constraint += 3;
+      return Converted;
+    }
     switch (Constraint[0]) {
     case 'p': // Keep 'p' constraint.
       return std::string("p");
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 41dc91c578c800a..27f7bb652895839 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -2563,9 +2563,15 @@ EmitAsmStores(CodeGenFunction &CGF, const AsmStmt &S,
     if ((i < ResultRegIsFlagReg.size()) && ResultRegIsFlagReg[i]) {
       // Target must guarantee the Value `Tmp` here is lowered to a boolean
       // value.
-      llvm::Constant *Two = llvm::ConstantInt::get(Tmp->getType(), 2);
+      unsigned CCUpperBound = 2;
+      if (CGF.getTarget().getTriple().getArch() == llvm::Triple::systemz) {
+        // On this target CC value can be in range [0, 3].
+        CCUpperBound = 4;
+      }
+      llvm::Constant *CCUpperBoundConst =
+          llvm::ConstantInt::get(Tmp->getType(), CCUpperBound);
       llvm::Value *IsBooleanValue =
-          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, Two);
+          Builder.CreateCmp(llvm::CmpInst::ICMP_ULT, Tmp, CCUpperBoundConst);
       llvm::Function *FnAssume = CGM.getIntrinsic(llvm::Intrinsic::assume);
       Builder.CreateCall(FnAssume, IsBooleanValue);
     }
diff --git a/clang/test/CodeGen/inline-asm-systemz-flag-output.c b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
new file mode 100644
index 000000000000000..ab90e031df1f2b8
--- /dev/null
+++ b/clang/test/CodeGen/inline-asm-systemz-flag-output.c
@@ -0,0 +1,149 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple s390x-linux -emit-llvm -o - %s | FileCheck %s
+// CHECK-LABEL: define dso_local signext i32 @foo_012(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2:[0-9]+]], !srcloc [[META2:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 2
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_012(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 2 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_013(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META3:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 1
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_013(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 1 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_023(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META4:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 0
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_023(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 0 || cc == 2 || cc == 3 ? 42 : 0;
+}
+
+// CHECK-LABEL: define dso_local signext i32 @foo_123(
+// CHECK-SAME: i32 noundef signext [[X:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT:  [[ENTRY:.*]]:
+// CHECK-NEXT:    [[X_ADDR:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    [[CC:%.*]] = alloca i32, align 4
+// CHECK-NEXT:    store i32 [[X]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP1:%.*]] = call { i32, i32 } asm sideeffect "ahi $0,42\0A", "=d,={@cc},0"(i32 [[TMP0]]) #[[ATTR2]], !srcloc [[META5:![0-9]+]]
+// CHECK-NEXT:    [[ASMRESULT:%.*]] = extractvalue { i32, i32 } [[TMP1]], 0
+// CHECK-NEXT:    [[ASMRESULT1:%.*]] = extractvalue { i32, i32 } [[TMP1]], 1
+// CHECK-NEXT:    store i32 [[ASMRESULT]], ptr [[X_ADDR]], align 4
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp ult i32 [[ASMRESULT1]], 4
+// CHECK-NEXT:    call void @llvm.assume(i1 [[TMP2]])
+// CHECK-NEXT:    store i32 [[ASMRESULT1]], ptr [[CC]], align 4
+// CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[TMP3]], 1
+// CHECK-NEXT:    br i1 [[CMP]], label %[[LOR_END:.*]], label %[[LOR_LHS_FALSE:.*]]
+// CHECK:       [[LOR_LHS_FALSE]]:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[TMP4]], 2
+// CHECK-NEXT:    br i1 [[CMP2]], label %[[LOR_END]], label %[[LOR_RHS:.*]]
+// CHECK:       [[LOR_RHS]]:
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CC]], align 4
+// CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 3
+// CHECK-NEXT:    br label %[[LOR_END]]
+// CHECK:       [[LOR_END]]:
+// CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ true, %[[LOR_LHS_FALSE]] ], [ true, %[[ENTRY]] ], [ [[CMP3]], %[[LOR_RHS]] ]
+// CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+// CHECK-NEXT:    [[COND:%.*]] = select i1 [[TMP6]], i32 42, i32 0
+// CHECK-NEXT:    ret i32 [[COND]]
+//
+int foo_123(int x) {
+  int cc;
+  asm volatile ("ahi %[x],42\n" : [x] "+d"(x), "=@cc" (cc));
+  return cc == 1 || cc == 2 || cc == 3 ? 42 : 0;
+}
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index e0b638201a04740..cb136fe2f446b43 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5071,6 +5071,9 @@ class TargetLowering : public TargetLoweringBase {
                                             std::vector<SDValue> &Ops,
                                             SelectionDAG &DAG) const;
 
+  // Lower switch statement for flag output operand with SRL/IPM Sequence.
+  virtual bool canLowerSRL_IPM_Switch(SDValue Cond) const;
+
   // Lower custom output constraints. If invalid, return SDValue().
   virtual SDValue LowerAsmOutputForConstraint(SDValue &Chain, SDValue &Glue,
                                               const SDLoc &DL,
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 3b046aa25f54440..a32787bc882f175 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -2831,8 +2831,37 @@ void SelectionDAGBuilder::visitBr(const BranchInst &I) {
       Opcode = Instruction::And;
     else if (match(BOp, m_LogicalOr(m_Value(BOp0), m_Value(BOp1))))
       Opcode = Instruction::Or;
-
-    if (Opcode &&
+    auto &TLI = DAG.getTargetLoweringInfo();
+    bool BrSrlIPM = FuncInfo.MF->getTarget().getTargetTriple().getArch() ==
+                    Triple::ArchType::systemz;
+    // For Flag output operands SRL/IPM sequence, we want to avoid
+    // creating switch case, as it creates Basic Block and inhibits
+    // optimization in DAGCombiner for flag output operands.
+    const auto checkSRLIPM = [&TLI](const SDValue &Op) {
+      if (!Op.getNumOperands())
+        return false;
+      SDValue OpVal = Op.getOperand(0);
+      SDNode *N = OpVal.getNode();
+      if (N && N->getOpcode() == ISD::SRL)
+        return TLI.canLowerSRL_IPM_Switch(OpVal);
+      else if (N && OpVal.getNumOperands() &&
+               (N->getOpcode() == ISD::AND || N->getOpcode() == ISD::OR)) {
+        SDValue OpVal1 = OpVal.getOperand(0);
+        SDNode *N1 = OpVal1.getNode();
+        if (N1 && N1->getOpcode() == ISD::SRL)
+          return TLI.canLowerSRL_IPM_Switch(OpVal1);
+      }
+      return false;
+    };
+    if (BrSrlIPM) {
+      if (NodeMap.count(BOp0) && NodeMap[BOp0].getNode()) {
+        BrSrlIPM &= checkSRLIPM(getValue(BOp0));
+        if (NodeMap.count(BOp1) && NodeMap[BOp1].getNode())
+          BrSrlIPM &= checkSRLIPM(getValue(BOp1));
+      } else
+        BrSrlIPM = false;
+    }
+    if (Opcode && !BrSrlIPM &&
         !(match(BOp0, m_ExtractElt(m_Value(Vec), m_Value())) &&
           match(BOp1, m_ExtractElt(m_Specific(Vec), m_Value()))) &&
         !shouldKeepJumpConditionsTogether(
@@ -12043,18 +12072,41 @@ void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,
       const APInt &SmallValue = Small.Low->getValue();
       const APInt &BigValue = Big.Low->getValue();
 
+      // Creating switch cases optimizing tranformation inhibits DAGCombiner
+      // for SystemZ for flag output operands. DAGCobiner compute cumulative
+      // CCMask for flag output operands SRL/IPM sequence, we want to avoid
+      // creating switch case, as it creates Basic Block and inhibits
+      // optimization in DAGCombiner for flag output operands.
+      // cases like (CC == 0) || (CC == 2) || (CC == 3), or
+      // (CC == 0) || (CC == 1) ^ (CC == 3), there could potentially be
+      // more cases like this.
+      const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+      bool IsSrlIPM = false;
+      if (NodeMap.count(Cond) && NodeMap[Cond].getNode())
+        IsSrlIPM = CurMF->getTarget().getTargetTriple().getArch() ==
+                       Triple::ArchType::systemz &&
+                   TLI.canLowerSRL_IPM_Switch(getValue(Cond));
       // Check that there is only one bit different.
       APInt CommonBit = BigValue ^ SmallValue;
-      if (CommonBit.isPowerOf2()) {
+      if (CommonBit.isPowerOf2() || IsSrlIPM) {
         SDValue CondLHS = getValue(Cond);
         EVT VT = CondLHS.getValueType();
         SDLoc DL = getCurSDLoc();
-
-        SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
-                                 DAG.getConstant(CommonBit, DL, VT));
-        SDValue Cond = DAG.getSetCC(
-            DL, MVT::i1, Or, DAG.getConstant(BigValue | SmallValue, DL, VT),
-            ISD::SETEQ);
+        SDValue Cond;
+
+        if (CommonBit.isPowerOf2()) {
+          SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
+                                   DAG.getConstant(CommonBit, DL, VT));
+          Cond = DAG.getSetCC(DL, MVT::i1, Or,
+                              DAG.getConstant(BigValue | SmallValue, DL, VT),
+                              ISD::SETEQ);
+        } else if (IsSrlIPM && BigValue == 3 && SmallValue == 0) {
+          SDValue SetCC =
+              DAG.getSetCC(DL, MVT::i32, CondLHS,
+                           DAG.getConstant(SmallValue, DL, VT), ISD::SETEQ);
+          Cond = DAG.getSetCC(DL, MVT::i32, SetCC,
+                              DAG.getConstant(BigValue, DL, VT), ISD::SETEQ);
+        }
 
         // Update successor info.
         // Both Small and Big will jump to Small.BB, so we sum up the
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 8287565336b54d1..3d48adac509cb9e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -5563,6 +5563,10 @@ const char *TargetLowering::LowerXConstraint(EVT ConstraintVT) const {
   return nullptr;
 }
 
+bool TargetLowering::canLowerSRL_IPM_Switch(SDValue Cond) const {
+  return false;
+}
+
 SDValue TargetLowering::LowerAsmOutputForConstraint(
     SDValue &Chain, SDValue &Glue, const SDLoc &DL,
     const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 3999b54de81b657..259da48a3b22321 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -1207,6 +1207,9 @@ SystemZTargetLowering::getConstraintType(StringRef Constraint) const {
     default:
       break;
     }
+  } else if (Constraint.size() == 5 && Constraint.starts_with("{")) {
+    if (StringRef("{@cc}").compare(Constraint) == 0)
+      return C_Other;
   }
   return TargetLowering::getConstraintType(Constraint);
 }
@@ -1389,6 +1392,10 @@ SystemZTargetLowering::getRegForInlineAsmConstraint(
       return parseRegisterNumber(Constraint, &SystemZ::VR128BitRegClass,
                                  SystemZMC::VR128Regs, 32);
     }
+    if (Constraint[1] == '@') {
+      if (StringRef("{@cc}").compare(Constraint) == 0)
+        return std::make_pair(0u, &SystemZ::GR32BitRegClass);
+    }
   }
   return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);
 }
@@ -1421,6 +1428,35 @@ Register SystemZTargetLowering::getExceptionSelectorRegister(
   return Subtarget.isTargetXPLINK64() ? SystemZ::R2D : SystemZ::R7D;
 }
 
+// Lower @cc targets via setcc.
+SDValue SystemZTargetLowering::LowerAsmOutputForConstraint(
+    SDValue &Chain, SDValue &Glue, const SDLoc &DL,
+    const AsmOperandInfo &OpInfo, SelectionDAG &DAG) const {
+  if (StringRef("{@cc}").compare(OpInfo.ConstraintCode) != 0)
+    return SDValue();
+
+  // Check that return type is valid.
+  if (OpInfo.ConstraintVT.isVector() || !OpInfo.ConstraintVT.isInteger() ||
+      OpInfo.ConstraintVT.getSizeInBits() < 8)
+    report_fatal_error("Glue output operand is of invalid type");
+
+  MachineFunction &MF = DAG.getMachineFunction();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  MRI.addLiveIn(SystemZ::CC);
+
+  if (Glue.getNode()) {
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32, Glue);
+    Chain = Glue.getValue(1);
+  } else
+    Glue = DAG.getCopyFromReg(Chain, DL, SystemZ::CC, MVT::i32);
+
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, DL, MVT::i32, Glue);
+  SDValue CC = DAG.getNode(ISD::SRL, DL, MVT::i32, IPM,
+                           DAG.getConstant(SystemZ::IPM_CC, DL, MVT::i32));
+
+  return CC;
+}
+
 void SystemZTargetLowering::LowerAsmOperandForConstraint(
     SDValue Op, StringRef Constraint, std::vector<SDValue> &Ops,
     SelectionDAG &DAG) const {
@@ -2485,6 +2521,21 @@ static unsigned CCMaskForCondCode(ISD::CondCode CC) {
 #undef CONV
 }
 
+static unsigned CCMaskForSystemZCCVal(unsigned CC) {
+  switch (CC) {
+  default:
+    llvm_unreachable("invalid integer condition!");
+  case 0:
+    return SystemZ::CCMASK_CMP_EQ;
+  case 1:
+    return SystemZ::CCMASK_CMP_LT;
+  case 2:
+    return SystemZ::CCMASK_CMP_GT;
+  case 3:
+    return SystemZ::CCMASK_CMP_UO;
+  }
+}
+
 // If C can be converted to a comparison against zero, ...
[truncated]

arsenm · 2025-02-06T02:47:56Z

clang/lib/Basic/Targets/SystemZ.cpp

@@ -90,6 +90,14 @@ bool SystemZTargetInfo::validateAsmConstraint(
  case 'T': // Likewise, plus an index
    Info.setAllowsMemory();
    return true;
+  case '@':
+    // CC condition changes.
+    if (strlen(Name) >= 3 && *(Name + 1) == 'c' && *(Name + 2) == 'c') {


Avoid strlen, use StringRef

arsenm · 2025-02-06T02:48:17Z

clang/lib/CodeGen/CGStmt.cpp

@@ -2563,9 +2563,15 @@ EmitAsmStores(CodeGenFunction &CGF, const AsmStmt &S,
    if ((i < ResultRegIsFlagReg.size()) && ResultRegIsFlagReg[i]) {
      // Target must guarantee the Value `Tmp` here is lowered to a boolean
      // value.
-      llvm::Constant *Two = llvm::ConstantInt::get(Tmp->getType(), 2);
+      unsigned CCUpperBound = 2;
+      if (CGF.getTarget().getTriple().getArch() == llvm::Triple::systemz) {


Should not have random triple checks here

Well, SystemZ simply is different here - our "flags" value is in fact a 2-bit value, not a 1-bit value. Do you have any suggestions how this distinction could be abstracted in a cleaner way?

This isn't abstracted at all. I have a hard time believing emitting this assume is worthwhile. If we really need to keep it, I would guess ResultRegIsFlagReg should be false if it's not a boolean flag

Removing llvm.assume intrinsic will cause performance hit. With CC range known, intrinsic guides optimizer to generate more optimized code. Verified it for SystemZ.

ResultRegIsFlagReg[i] will be true only if AsmStmt S has its ith OutputConstraint starts with "{@cc".

I have tried to abstract out Triple check. It adds to one virtual function call though. Any suggestions for abstracting it in a cleaner way?

The "right" thing is probably to put the output range into the ConstraintInfo.

…string.

…ng in SystemZISelLowering.cpp.

…llvm

arsenm

Removing llvm.assume intrinsic will cause performance hit.

I think it's more likely including the assume is the hit

arsenm · 2025-02-12T07:23:44Z

clang/include/clang/Basic/TargetInfo.h

+
+  // CC is binary on most targets. SystemZ overrides it as CC interval is
+  // [0, 4).
+  virtual unsigned getFlagOutputCCUpperBound() const { return 2; }


Can this go into ConstraintInfo instead of a new hook?

uweigand · 2025-02-12T18:22:54Z

clang/lib/Basic/Targets/SystemZ.h

  bool validateAsmConstraint(const char *&Name,
                             TargetInfo::ConstraintInfo &info) const override;

  std::string convertConstraint(const char *&Constraint) const override {
+    if (llvm::StringRef(Constraint).starts_with("@cc")) {


Why are we trying to support different constraint strings here? On our platform, it should be enough to check for exact match against "@cc", right?

clang/lib/Basic/Targets/SystemZ.h

uweigand · 2025-02-12T18:26:57Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

-    if (Opcode &&
+    auto &TLI = DAG.getTargetLoweringInfo();
+    bool BrSrlIPM = FuncInfo.MF->getTarget().getTargetTriple().getArch() ==
+                    Triple::ArchType::systemz;


We really shouldn't have triple checks here in common SelectionDAG code. If absolutely needed, this should be abstracted behind appropriate target callbacks.

However, I'm wondering if this is indeed needed at all at this point. Can't we just let common code canonicalize the sequence of if statements into a switch statement, and then recognize the particular form of switch (with input coming from an IPM, and only two different switch targets) in platform-specific DAGCombine and directly translate it into a ccmask branch?

uweigand · 2025-02-12T18:43:18Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+
+  SDValue IPM = DAG.getNode(SystemZISD::IPM, DL, MVT::i32, Glue);
+  SDValue CC = DAG.getNode(ISD::SRL, DL, MVT::i32, IPM,
+                           DAG.getConstant(SystemZ::IPM_CC, DL, MVT::i32));


I guess we should use getCCResult here.

uweigand · 2025-02-12T18:46:21Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

-  if (combineCCMask(CCReg, CCValidVal, CCMaskVal))
+  // combineCCIPMMask tries to combine srl/ipm sequence for flag output operand.
+  if (combineCCIPMMask(CCReg, CCValidVal, CCMaskVal) ||
+      combineCCMask(CCReg, CCValidVal, CCMaskVal))


Why do we need the separate routine here? combineCCMask already attempts to handle cases involving IPM - those that result from intrinsics that set CC. Note that in general, we should apply the exact same set of optimizations whether the CC value was generated by an intrinsic or by an inline asm.

uweigand · 2025-02-14T20:22:29Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+      BrSrlIPM |= checkSRLIPM(getValue(BOp0));
+      if (NodeMap.count(BOp1) && NodeMap[BOp1].getNode())
+        BrSrlIPM &= checkSRLIPM(getValue(BOp1));
+    }


This is already better than a target check, but there's still a whole lot of implicitly target-specific code here. There really shouldn't be a generic callback canLowerSRL_IPM_Switch - that even explicitly refers to SystemZ instruction names! If there's target-specific behavior needed here, this should be better abstracted.

Note that I see there's already a target hook to guide whether or not this transformation should be performed: the getJumpConditionMergingParams callback that provides input to the shouldKeepJumpConditionsTogether. I think you should investigate whether we can create a SystemZ-specific implementation of that callback that has the desired effect of inhibiting this transformation in the cases we care about. That should then work without any common-code change to this function.

uweigand · 2025-02-14T20:27:20Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+                           DAG.getConstant(SmallValue, DL, VT), ISD::SETEQ);
+          Cond = DAG.getSetCC(DL, MVT::i32, SetCC,
+                              DAG.getConstant(BigValue, DL, VT), ISD::SETEQ);
+        }


Again, this very SystemZ specific optimization shouldn't really be here. Doesn't this just revert the decision to introduce a switch statement that was made above? Could this not handled either by the visitBr above via the getJumpConditionMergingParams callback; or else fully in SystemZ DAGCombine code?

clang/lib/Basic/Targets/SystemZ.h

uweigand · 2025-02-14T20:29:28Z

clang/lib/Basic/Targets/SystemZ.h

@@ -119,6 +119,12 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                             TargetInfo::ConstraintInfo &info) const override;

  std::string convertConstraint(const char *&Constraint) const override {
+    if (llvm::StringRef(Constraint) == "@cc") {
+      auto Len = llvm::StringRef("@cc").size();


This is a compile-time constant. Again, in SystemZ.cpp that is hard-coded; why do we need this expression here?

uweigand · 2025-02-14T20:36:37Z

clang/lib/CodeGen/CGStmt.cpp

+      TargetInfo::ConstraintInfo Info(S.getOutputConstraint(i), Name);
+      bool IsValid = CGF.getTarget().validateOutputConstraint(Info);
+      (void)IsValid;
+      assert(IsValid && "Failed to parse flag output operand constraint");


All this parsing was done in the caller of this routine (EmitAsmStmt) already - we shouldn't do that again here. I think instead of passing the ResultRegIsFlagReg array down into this routine, the caller should already compute the appropriate bounds and pass an array of those bounds into this function.

This might even allow us to remove the hard-coded llvm::StringRef(OutputConstraint).starts_with("{@cc") test in EmitAsmStmt and defer to the target the decision which output operands may be assumed to fall within a certain range of values.

… bound for all backend suuporting flag output operand (X86, AARCH64 and SystemZ). - Remove all changes target specific changes from SelectionDAGBuiler.cpp. - Added getJumpConditionMergingParams for SystemZ for setting cost for merging srl/ipm/cc. - TODO: Handle the cases where simplifyBranchOnICmpChain creates switch table while folding branch on And'd or Or'd chain of icmp instructions.

uweigand · 2025-03-14T15:36:13Z

clang/include/clang/Basic/TargetInfo.h

@@ -1114,10 +1114,12 @@ class TargetInfo : public TransferrableTargetInfo,

    std::string ConstraintStr;  // constraint: "=rm"
    std::string Name;           // Operand name: [foo] with no []'s.
+    unsigned FlagOutputCCUpperBound;


I am wondering if we can re-use the existing range fields ImmRange here. Those are currently only used for input operands and require that only immediates within this range are used as input. For an output operand, this isn't useful - but instead we could take it to mean that the output is logically constrained to fall within that range, so we can add appropriate assertions.

uweigand · 2025-03-14T15:36:55Z

clang/include/clang/Basic/TargetInfo.h

@@ -1188,6 +1190,14 @@ class TargetInfo : public TransferrableTargetInfo,
      TiedOperand = N;
      // Don't copy Name or constraint string.
    }
+
+    // CC range can be set by target. SystemZ sets it to 4. It is 2 by default.


Comment is wrong now (2 is no longer default). Also, it's probably not necessary to specifically call out SystemZ here.

uweigand · 2025-03-14T15:37:11Z

clang/include/clang/Basic/TargetInfo.h

@@ -1228,6 +1238,7 @@ class TargetInfo : public TransferrableTargetInfo,
                             std::string &/*SuggestedModifier*/) const {
    return true;
  }
+


This shouldn't be here.

uweigand · 2025-03-14T15:37:39Z

clang/lib/CodeGen/CGStmt.cpp

@@ -2601,7 +2601,7 @@ EmitAsmStores(CodeGenFunction &CGF, const AsmStmt &S,
              const llvm::ArrayRef<LValue> ResultRegDests,
              const llvm::ArrayRef<QualType> ResultRegQualTys,
              const llvm::BitVector &ResultTypeRequiresCast,
-              const llvm::BitVector &ResultRegIsFlagReg) {
+              const std::vector<unsigned> &ResultRegIsFlagReg) {


The argument name should also be updated now.

uweigand · 2025-03-14T15:38:21Z

clang/lib/CodeGen/CGStmt.cpp

+      // optimizer to generate more optimized IR in most of the cases as
+      // observed for select_cc on SystemZ unit tests for flag output operands.
+      // For some cases for br_cc, generated IR was weird. e.g. switch table
+      // for simple simple comparison terms for br_cc.


You don't need to explain that wrong code will result from an incorrect assertion.

uweigand · 2025-03-14T15:51:13Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

  return false;
 }

+std::optional<SDValue>


As an aside: SDValue has a built-in default value SDValue(), so I don't think the std::optional is needed here.

uweigand · 2025-03-14T15:53:12Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

  return false;
 }

+std::optional<SDValue>
+SystemZTargetLowering::combineSELECT_CC_CCIPMMask(SDNode *N,
+                                                  DAGCombinerInfo &DCI) const {


I don't understand why we need yet another function here, which is called only for SELECT_CCMASK and not BR_CCMASK. Shouldn't all these optimizations apply equally to both cases? Why cannot this be integrated into combineCCMask?

uweigand · 2025-03-14T15:53:35Z

llvm/lib/Target/SystemZ/SystemZISelLowering.h

@@ -756,7 +768,11 @@ class SystemZTargetLowering : public TargetLowering {
  SDValue combineINT_TO_FP(SDNode *N, DAGCombinerInfo &DCI) const;
  SDValue combineBSWAP(SDNode *N, DAGCombinerInfo &DCI) const;
  SDValue combineBR_CCMASK(SDNode *N, DAGCombinerInfo &DCI) const;
+  std::optional<SDValue> combineBR_CCJoinIPMMask(SDNode *N,
+                                                 DAGCombinerInfo &DCI) const;


This doesn't exist?

uweigand · 2025-03-14T15:54:27Z

clang/lib/Basic/Targets/SystemZ.cpp

+    if (StringRef(Name) == "@cc") {
+      Name += 2;
+      Info.setAllowsRegister();
+      Info.setFlagOutputCCUpperBound(4);


Here we should have the platform-specific comment explaining why this is 4.

uweigand · 2025-03-14T15:55:43Z

clang/lib/Basic/Targets/SystemZ.h

+    if (llvm::StringRef(Constraint) == "@cc") {
+      Constraint += 2;
+      return std::string("{@cc}");
+    }


I think we also should have a case '@': in the switch statement below and move this check there.

uweigand

Not a full review, just some initial comments on combineCCMask. I think it would be good to have more comments explaining the specific transformations you're attempting to implement, with an argument why they are correct for all inputs.

uweigand · 2025-04-25T14:05:36Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

-  // Optimize the case where CompareLHS is a SELECT_CCMASK.
-  if (CompareLHS->getOpcode() == SystemZISD::SELECT_CCMASK) {
-    // Verify that we have an appropriate mask for a EQ or NE comparison.
+  // Optimize (TM (IPM (CC)))


Adding a case to optimize (TM (IPM)) in addition to (ICMP (IPM)) does make sense in general. However, you need to take care that the optimization is correct for all possible inputs to TM, not just the ones the come up in the "good case" you're looking at. That doesn't appear to be the case here.

uweigand · 2025-04-25T14:06:40Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

    bool Invert = false;
-    if (CCMask == SystemZ::CCMASK_CMP_NE)
+    if (CCMask == SystemZ::CCMASK_TM_SOME_1)
      Invert = !Invert;


There's four possible CCMask values for TM. It doesn't look all four are handled correctly. (You can of course bail out if there's some mask value you don't support, but you shouldn't make any silent assumptions.)

uweigand · 2025-04-25T14:07:21Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

      Invert = !Invert;
-    else if (CCMask != SystemZ::CCMASK_CMP_EQ)
+    auto *N = CCNode->getOperand(0).getNode();
+    auto Shift = dyn_cast<ConstantSDNode>(CCNode->getOperand(1));


Calling the operand of TM "Shift" is confusing. There's no shift happening here; TM basically performs an "and" operation.

uweigand · 2025-04-25T14:08:21Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+    if (N->getOpcode() == SystemZISD::IPM) {
+      auto ShiftVal = Shift->getZExtValue();
+      if (ShiftVal == (1 << SystemZ::IPM_CC))
+        CCMask = SystemZ::CCMASK_CMP_GE;


Well, what if the second TM operand is anything else? You'll still do the optimization here, which is likely to be incorrect.

uweigand · 2025-04-25T14:09:07Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+      // OP1. (SELECT_CCMASK (ICMP (SRL (IPM (CC))))).
+      // OP2. (SRL (IPM (CC))).
+      if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK /*&&
+          isSRL_IPM_CCSequence(XOROp2)*/) {


I don't even fully understand what optimization you're attempting here - but this code completely ignores Op2, which is obviously incorrect.

uweigand · 2025-04-25T14:13:09Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+    return false;
+  }
+  // (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))
+  if (CCNode->getOpcode() == SystemZISD::SELECT_CCMASK) {


The same comment as above - SELECT_CCMASK (while at least a Z specific opcode) does not itself set the condition code (it uses it, of course), and so it cannot be an input to combineCCMask either.

uweigand · 2025-04-25T14:13:29Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+  }
+
+  // Both oerands of XOR are (SELECT_CCMASK (ICMP (SRL (IPM (CC))))).
+  if (CCNode->getOpcode() == ISD::XOR) {


And once again an ISD::XOR cannot be an input to combineCCMask.

uweigand · 2025-04-25T14:21:27Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+                             : CmpNode2;
+      int CmpCCValid = CCValid, SelectCCValid = CCValid;
+      int CmpCCMask = CCMask, SelectCCMask = CCMask;
+      bool IsOp1 = combineCCMask(CmpOp, CmpCCValid, CmpCCMask);


This calls combineCCMask on some random operation that does not set a condition code - is this why you end up in some of those cases above? That doesn't make sense. What is the actual optimization this code path is supposed to achieve?

uweigand · 2025-04-25T14:28:54Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+    if (CCMask == SystemZ::CCMASK_CMP_NE)
+      Invert = !Invert;
+    SDValue NewCCReg = CCNode->getOperand(0);
+    if (combineCCMask(NewCCReg, CCValid, CCMask)) {


Again a recursive call with an opcode that does not set CC.

uweigand · 2025-04-25T14:29:07Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+      CCValid = SystemZ::CCMASK_ANY;
+      return true;
+    } else if (CCMask == SystemZ::CCMASK_CMP_NE ||
+               CCMask != SystemZ::CCMASK_CMP_EQ) {


This condition looks incorrect.

anoopkg6 · 2025-04-28T18:54:04Z

[like] Anoop Kumar reacted to your message:

________________________________ From: Ulrich Weigand ***@***.***> Sent: Friday, April 25, 2025 2:30:52 PM To: llvm/llvm-project ***@***.***> Cc: Anoop Kumar ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [llvm/llvm-project] Add support for flag output operand ***@***.***" for SystemZ. (PR #125970) @ uweigand commented on this pull request. Not a full review, just some initial comments on combineCCMask. I think it would be good to have more comments explaining the specific transformations you're attempting to implement, with an argument @uweigand commented on this pull request. Not a full review, just some initial comments on combineCCMask. I think it would be good to have more comments explaining the specific transformations you're attempting to implement, with an argument why they are correct for all inputs.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- // Optimize the case where CompareLHS is a SELECT_CCMASK. - if (CompareLHS->getOpcode() == SystemZISD::SELECT_CCMASK) { - // Verify that we have an appropriate mask for a EQ or NE comparison. + // Optimize (TM (IPM (CC))) Adding a case to optimize (TM (IPM)) in addition to (ICMP (IPM)) does make sense in general. However, you need to take care that the optimization is correct for all possible inputs to TM, not just the ones the come up in the "good case" you're looking at. That doesn't appear to be the case here.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

bool Invert = false;

- if (CCMask == SystemZ::CCMASK_CMP_NE) + if (CCMask == SystemZ::CCMASK_TM_SOME_1) Invert = !Invert; There's four possible CCMask values for TM. It doesn't look all four are handled correctly. (You can of course bail out if there's some mask value you don't support, but you shouldn't make any silent assumptions.)

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

Invert = !Invert;

- else if (CCMask != SystemZ::CCMASK_CMP_EQ) + auto *N = CCNode->getOperand(0).getNode(); + auto Shift = dyn_cast<ConstantSDNode>(CCNode->getOperand(1)); Calling the operand of TM "Shift" is confusing. There's no shift happening here; TM basically performs an "and" operation.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- - // Verify that the ICMP compares against one of select values. - auto *TrueVal = dyn_cast<ConstantSDNode>(CompareLHS->getOperand(0)); - if (!TrueVal) + if (N->getOpcode() == SystemZISD::IPM) { + auto ShiftVal = Shift->getZExtValue(); + if (ShiftVal == (1 << SystemZ::IPM_CC)) + CCMask = SystemZ::CCMASK_CMP_GE; Well, what if the second TM operand is anything else? You'll still do the optimization here, which is likely to be incorrect.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ CCMask = SystemZ::CCMASK_CMP_GE;

+ if (Invert) + CCMask ^= CCValid; + // Return the updated CCReg link. + CCReg = N->getOperand(0); + return true; + } else if (N->getOpcode() == ISD::XOR) { + // Optimize (TM (XOR (OP1 OP2))). + auto *XOROp1 = N->getOperand(0).getNode(); + auto *XOROp2 = N->getOperand(1).getNode(); + if (!XOROp1 || !XOROp2) + return false; + // OP1. (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). + // OP2. (SRL (IPM (CC))). + if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK /*&& + isSRL_IPM_CCSequence(XOROp2)*/) { I don't even fully understand what optimization you're attempting here - but this code completely ignores Op2, which is obviously incorrect.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ int CCValidVal = CCValid1->getZExtValue();

+ int CCMaskVal = CCMask1->getZExtValue(); + if (combineCCMask(XORReg, CCValidVal, CCMaskVal)) { + // CC == 0 || CC == 2 for bit 28 Test Under Mask. + CCMask = SystemZ::CCMASK_CMP_GE; + CCMask ^= CCMaskVal; + if (Invert) + CCMask ^= CCValid; + CCReg = XORReg; + return true; + } + } + } + } + // Optimize (AND (SRL (IPM (CC)))). + if (CCNode->getOpcode() == ISD::AND) { Here is starts to look very confusing. The operand of the combineCCMask routine has to be some Z instruction that sets the condition code - the whole routine is about analysing condition codes set by prior instructions! A plain ISD::AND does not set any Z condition code at all, it simply has a regular (integer) output value - how would it ever be a possible input to combineCCMask? What does this even mean?

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- if (CompareRHS->getAPIntValue() == FalseVal->getAPIntValue()) - Invert = !Invert; - else if (CompareRHS->getAPIntValue() != TrueVal->getAPIntValue()) + // Bit 28 false (CC == 0) || (CC == 2). + // Caller can invert it depending on CCmask there. + if (ANDConst->getZExtValue() == 1) { + CCMask = SystemZ::CCMASK_0 | SystemZ::CCMASK_2; + CCValid = SystemZ::CCMASK_ANY; + return true; + } + return false; + } + // (SELECT_CCMASK (ICMP (SRL (IPM (CC))))) + if (CCNode->getOpcode() == SystemZISD::SELECT_CCMASK) { The same comment as above - SELECT_CCMASK (while at least a Z specific opcode) does not itself set the condition code (it uses it, of course), and so it cannot be an input to combineCCMask either.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

- auto *NewCCMask = dyn_cast<ConstantSDNode>(CompareLHS->getOperand(3));

- if (!NewCCValid || !NewCCMask) + int CCValidVal = CCValidNode->getZExtValue(); + int CCMaskVal = CCMaskNode->getZExtValue(); + SDValue CCRegOp = CCNode->getOperand(4); + if (combineCCMask(CCRegOp, CCValidVal, CCMaskVal)) { + CCMask = CCMaskVal; + CCValid = SystemZ::CCMASK_ANY; + CCReg = CCRegOp; + return true; + } + return false; + } + + // Both oerands of XOR are (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). + if (CCNode->getOpcode() == ISD::XOR) { And once again an ISD::XOR cannot be an input to combineCCMask.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ if (!RHS) {

+ SDValue CmpOp1 = CCNode->getOperand(0); + SDValue CmpOp2 = CCNode->getOperand(1); + auto *CmpNode1 = CmpOp1.getNode(), *CmpNode2 = CmpOp2.getNode(); + if (!CmpNode1 || !CmpNode2) + return false; + if (CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK || + CmpNode2->getOpcode() == SystemZISD::SELECT_CCMASK) { + SDValue CmpOp = + CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK ? CmpOp2 : CmpOp1; + SDNode *SelectCC = CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK + ? CmpNode1 + : CmpNode2; + int CmpCCValid = CCValid, SelectCCValid = CCValid; + int CmpCCMask = CCMask, SelectCCMask = CCMask; + bool IsOp1 = combineCCMask(CmpOp, CmpCCValid, CmpCCMask); This calls combineCCMask on some random operation that does not set a condition code - is this why you end up in some of those cases above? That doesn't make sense. What is the actual optimization this code path is supposed to achieve?

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

}

+ int CmpVal = RHS->getZExtValue(); + // (BR_CC (ICMP (SELECT_CCMASK (CC)))) + if (LHS->getOpcode() == SystemZISD::SELECT_CCMASK) { + int CCVal = RHS->getZExtValue(); + int Mask = CCMaskForICmpEQCCVal(CCVal); + bool Invert = false; + if (CCMask == SystemZ::CCMASK_CMP_NE) + Invert = !Invert; + SDValue NewCCReg = CCNode->getOperand(0); + if (combineCCMask(NewCCReg, CCValid, CCMask)) { Again a recursive call with an opcode that does not set CC.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ if (LHS->getOpcode() == SystemZISD::SELECT_CCMASK) {

+ int CCVal = RHS->getZExtValue(); + int Mask = CCMaskForICmpEQCCVal(CCVal); + bool Invert = false; + if (CCMask == SystemZ::CCMASK_CMP_NE) + Invert = !Invert; + SDValue NewCCReg = CCNode->getOperand(0); + if (combineCCMask(NewCCReg, CCValid, CCMask)) { + CCMask |= Mask; + if (Invert) + CCMask ^= SystemZ::CCMASK_ANY; + CCReg = NewCCReg; + CCValid = SystemZ::CCMASK_ANY; + return true; + } else if (CCMask == SystemZ::CCMASK_CMP_NE || + CCMask != SystemZ::CCMASK_CMP_EQ) { This condition looks incorrect. — Reply to this email directly, view it on GitHub<#125970 (review) >, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BM5K4GTSFUQ33NTN6K4DE4D23JBJZAVCNFSM6AAAAABWSJVEQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOOJUGI2DQMBUHE >. You are receiving this because you authored the thread.Message ID: ***@***.***>

anoopkg6 · 2025-04-28T23:35:36Z

Hi Ulrich, I have taken one example from code review feedback. I will incorporate more changes to code after I understand. I have commented on xor in this example code changes. This example is for code review comment on line 8879. 88979 +if (LHS->getOpcode() == ISD::XOR) {

----------------------------------------------------------------------------- test.c ----------------------------------------------------------------------------- static __attribute__((always_inline)) inline int __atomic_dec_and_test_023(int *ptr) { int cc; asm volatile( " alsi %[ptr],-1\n" : ***@***.***" (cc), [ptr] "+QS" (*ptr) : : "memory"); return (cc == 0) ^ (cc == 2) ^ (cc == 3); } int a; long fu_023(void) { if (__atomic_dec_and_test_023(&a)) return 5; return 8; ---------------------------------------------------------------------------------- Initial input DAG to combineCCMask --------------------------------------------------------------------------------- Function: fu_023 SelectionDAG has 38 nodes: t0: ch,glue = EntryToken t8: ch,glue = inlineasm t0, TargetExternalSymbol:i64' alsi $1,-1 ', MDNode:ch<0x1b4dec58>, TargetConstant:i64<25>, TargetConstant:i32<458762>, Register:i32 %0, TargetConstant:i32<524302>, t51, TargetConstant:i32<524302>, t51 t10: i32,ch,glue = CopyFromReg t8, Register:i32 $cc, t8:1 t11: i32 = SystemZISD::IPM t10 t13: i32 = srl t11, Constant:i32<28> t47: i32 = SystemZISD::ICMP t13, Constant:i32<3>, TargetConstant:i32<0> t49: i32 = SystemZISD::SELECT_CCMASK Constant:i32<1>, Constant:i32<0>, TargetConstant:i32<14>, TargetConstant:i32<6>, t47 t40: i32 = SystemZISD::ICMP t13, Constant:i32<0>, TargetConstant:i32<0> t44: i32 = SystemZISD::SELECT_CCMASK Constant:i32<1>, Constant:i32<0>, TargetConstant:i32<14>, TargetConstant:i32<8>, t40 t45: i32 = SystemZISD::ICMP t13, Constant:i32<2>, TargetConstant:i32<0> t46: i32 = SystemZISD::SELECT_CCMASK Constant:i32<1>, Constant:i32<0>, TargetConstant:i32<14>, TargetConstant:i32<8>, t45 t32: i32 = xor t44, t46 t34: i32 = xor t49, t32 t52: i32 = SystemZISD::ICMP t34, Constant:i32<0>, TargetConstant:i32<0> t53: i64 = SystemZISD::SELECT_CCMASK Constant:i64<8>, Constant:i64<5>, TargetConstant:i32<14>, TargetConstant:i32<6>, t52 t28: ch,glue = CopyToReg t10:1, Register:i64 $r2d, t53 t51: i64 = SystemZISD::PCREL_WRAPPER TargetGlobalAddress:i64<ptr @A> 0 t29: ch = SystemZISD::RET_GLUE t28, Register:i64 $r2d, t28:1

---------------------------------------------------------------------------------------------------------------------------- Above is the Initial Input DAG. t32: i32 = xor t44, t46 t34: i32 = xor t49, t32 t32 xor has t49 and t32 operands. It calls combineCCMask on t49 in recursion to combine SystemZISD::SELECT_CCMASK with t10. (SELECT_CCMASK (ICMP (SRL (IPM t10)))), effectively looking sub-expression (ICMP (SRL (IPM t10)), checking for isSRL_IPM_CCSequence(). t49 fourth operand - t47 is replaced with t10 after combining the sequence. Recursion Depth = 1 for combineCCMask for combining SELECT_CCMASK t49 to t10. CCMask for t49 = 0x1 CombineCCMask is called on t32, which is xor again whose both operands, t44 and t46, are SystemZISD::SELECT_CCMASK. CombineCCMask is called on xor which combines t44 (SELECT_CCMASK (ICMP (SRL (IPM t10)))) sequence with t0. Similarly, for t46. t44 and t46 are replaced with t10 after combining. Recursion Depth = 2, one for calling for t32 xor, which in turns calls CombineCCMask for t44 to combine it with t10 for (SELECT_CCMASK (ICMP (SRL (IPM t10) ))) and comes back and again calls CombineCCMask for t46 to combine it with t10 for (SELECT_CCMASK (ICMP (SRL (IPM t10)))). Effectively checking isSRL_IPM_CCSequence. CCMask for t44 = 0x1000 and t46 = 0x0010. t53 has fourth operands t52, which is combined with t34, which is already been combined with t10. Whole DAG after combining results in one SELECT_CCMASK combined with t10 with cumulative computed CCMask = 0x1011. Recursion depth is mostly 0 and 1 in a few cases. Recursion depth is 2 only in very special cases like this. -----------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------- Final output after combineCCMask t53 fourth argument t52 is combine with t10 . -------------------------------------------------------------------------------------------------------------------------------- ***** in combineSELECT_CCMASK t53: i64 = <<Unknown Node #513>> Constant:i64<8>, Constant:i64<5>, TargetConstant:i32<14>, TargetConstant:i32<6>, t52 --------Anoop combineSELECT_CC_CCIPMMASK Function: fu_023 SelectionDAG has 20 nodes: t0: ch,glue = EntryToken t8: ch,glue = inlineasm t0, TargetExternalSymbol:i64' alsi $1,-1 ', MDNode:ch<0x1b4dec58>, TargetConstant:i64<25>, TargetConstant:i32<458762>, Register:i32 %0, TargetConstant:i32<524302>, t51, TargetConstant:i32<524302>, t51 t10: i32,ch,glue = CopyFromReg t8, Register:i32 $cc, t8:1 t56: i64 = SystemZISD::SELECT_CCMASK Constant:i64<8>, Constant:i64<5>, TargetConstant:i32<15>, TargetConstant:i32<4>, t10 // All 3 select_cc combined t28: ch,glue = CopyToReg t10:1, Register:i64 $r2d, t56 t51: i64 = SystemZISD::PCREL_WRAPPER TargetGlobalAddress:i64<ptr @A> 0 t29: ch = SystemZISD::RET_GLUE t28, Register:i64 $r2d, t28:1 ------------------------------------------------------------------------------------------------------------------------------------- Assembly test.s fu_023: # @fu_023 # %bb.0: # %entry larl %r1, a #APP alsi 0(%r1), -1 #NO_APP lghi %r2, 8 blr %r14 .LBB0_1: # %entry lghi %r2, 5 br %r14 .Lfunc_end0: .size fu_023, .Lfunc_end0-fu_023 # -- End function .type ***@***.*** # @A .section .***@***.*** .globl a ------------------------------------------------------------------------------------------------------------------------------------------ I am attaching ISD::XOR code and git diff for SystemZISelLowering.cpp and xor code file relevant portion for xor code for above example to illustrate this example. This is not for review as I am still adding comments and some explicit return false statements if DAG pattern input will not be combined. Though, for good cases combineCCMask combines, update CCReg to t10 in this example, update computed CCMask by reference and returns true to the caller where DAG node is replaces select_cc with updated CCReg(t10) and CCMask. After combining with t10, srl/ipm sequence get optimized. Thanks, Anoop

________________________________ From: Ulrich Weigand ***@***.***> Sent: Friday, April 25, 2025 9:30 AM To: llvm/llvm-project ***@***.***> Cc: Anoop Kumar ***@***.***>; Author ***@***.***> Subject: [EXTERNAL] Re: [llvm/llvm-project] Add support for flag output operand ***@***.***" for SystemZ. (PR #125970) @ uweigand commented on this pull request. Not a full review, just some initial comments on combineCCMask. I think it would be good to have more comments explaining the specific transformations you're attempting to implement, with an argument @uweigand commented on this pull request. Not a full review, just some initial comments on combineCCMask. I think it would be good to have more comments explaining the specific transformations you're attempting to implement, with an argument why they are correct for all inputs.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- // Optimize the case where CompareLHS is a SELECT_CCMASK. - if (CompareLHS->getOpcode() == SystemZISD::SELECT_CCMASK) { - // Verify that we have an appropriate mask for a EQ or NE comparison. + // Optimize (TM (IPM (CC))) Adding a case to optimize (TM (IPM)) in addition to (ICMP (IPM)) does make sense in general. However, you need to take care that the optimization is correct for all possible inputs to TM, not just the ones the come up in the "good case" you're looking at. That doesn't appear to be the case here.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

bool Invert = false;

- if (CCMask == SystemZ::CCMASK_CMP_NE) + if (CCMask == SystemZ::CCMASK_TM_SOME_1) Invert = !Invert; There's four possible CCMask values for TM. It doesn't look all four are handled correctly. (You can of course bail out if there's some mask value you don't support, but you shouldn't make any silent assumptions.)

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

Invert = !Invert;

- else if (CCMask != SystemZ::CCMASK_CMP_EQ) + auto *N = CCNode->getOperand(0).getNode(); + auto Shift = dyn_cast<ConstantSDNode>(CCNode->getOperand(1)); Calling the operand of TM "Shift" is confusing. There's no shift happening here; TM basically performs an "and" operation.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- - // Verify that the ICMP compares against one of select values. - auto *TrueVal = dyn_cast<ConstantSDNode>(CompareLHS->getOperand(0)); - if (!TrueVal) + if (N->getOpcode() == SystemZISD::IPM) { + auto ShiftVal = Shift->getZExtValue(); + if (ShiftVal == (1 << SystemZ::IPM_CC)) + CCMask = SystemZ::CCMASK_CMP_GE; Well, what if the second TM operand is anything else? You'll still do the optimization here, which is likely to be incorrect.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ CCMask = SystemZ::CCMASK_CMP_GE;

+ if (Invert) + CCMask ^= CCValid; + // Return the updated CCReg link. + CCReg = N->getOperand(0); + return true; + } else if (N->getOpcode() == ISD::XOR) { + // Optimize (TM (XOR (OP1 OP2))). + auto *XOROp1 = N->getOperand(0).getNode(); + auto *XOROp2 = N->getOperand(1).getNode(); + if (!XOROp1 || !XOROp2) + return false; + // OP1. (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). + // OP2. (SRL (IPM (CC))). + if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK /*&& + isSRL_IPM_CCSequence(XOROp2)*/) { I don't even fully understand what optimization you're attempting here - but this code completely ignores Op2, which is obviously incorrect.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ int CCValidVal = CCValid1->getZExtValue();

+ int CCMaskVal = CCMask1->getZExtValue(); + if (combineCCMask(XORReg, CCValidVal, CCMaskVal)) { + // CC == 0 || CC == 2 for bit 28 Test Under Mask. + CCMask = SystemZ::CCMASK_CMP_GE; + CCMask ^= CCMaskVal; + if (Invert) + CCMask ^= CCValid; + CCReg = XORReg; + return true; + } + } + } + } + // Optimize (AND (SRL (IPM (CC)))). + if (CCNode->getOpcode() == ISD::AND) { Here is starts to look very confusing. The operand of the combineCCMask routine has to be some Z instruction that sets the condition code - the whole routine is about analysing condition codes set by prior instructions! A plain ISD::AND does not set any Z condition code at all, it simply has a regular (integer) output value - how would it ever be a possible input to combineCCMask? What does this even mean?

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

return false;

- if (CompareRHS->getAPIntValue() == FalseVal->getAPIntValue()) - Invert = !Invert; - else if (CompareRHS->getAPIntValue() != TrueVal->getAPIntValue()) + // Bit 28 false (CC == 0) || (CC == 2). + // Caller can invert it depending on CCmask there. + if (ANDConst->getZExtValue() == 1) { + CCMask = SystemZ::CCMASK_0 | SystemZ::CCMASK_2; + CCValid = SystemZ::CCMASK_ANY; + return true; + } + return false; + } + // (SELECT_CCMASK (ICMP (SRL (IPM (CC))))) + if (CCNode->getOpcode() == SystemZISD::SELECT_CCMASK) { The same comment as above - SELECT_CCMASK (while at least a Z specific opcode) does not itself set the condition code (it uses it, of course), and so it cannot be an input to combineCCMask either.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

- auto *NewCCMask = dyn_cast<ConstantSDNode>(CompareLHS->getOperand(3));

- if (!NewCCValid || !NewCCMask) + int CCValidVal = CCValidNode->getZExtValue(); + int CCMaskVal = CCMaskNode->getZExtValue(); + SDValue CCRegOp = CCNode->getOperand(4); + if (combineCCMask(CCRegOp, CCValidVal, CCMaskVal)) { + CCMask = CCMaskVal; + CCValid = SystemZ::CCMASK_ANY; + CCReg = CCRegOp; + return true; + } + return false; + } + + // Both oerands of XOR are (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). + if (CCNode->getOpcode() == ISD::XOR) { And once again an ISD::XOR cannot be an input to combineCCMask.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ if (!RHS) {

+ SDValue CmpOp1 = CCNode->getOperand(0); + SDValue CmpOp2 = CCNode->getOperand(1); + auto *CmpNode1 = CmpOp1.getNode(), *CmpNode2 = CmpOp2.getNode(); + if (!CmpNode1 || !CmpNode2) + return false; + if (CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK || + CmpNode2->getOpcode() == SystemZISD::SELECT_CCMASK) { + SDValue CmpOp = + CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK ? CmpOp2 : CmpOp1; + SDNode *SelectCC = CmpNode1->getOpcode() == SystemZISD::SELECT_CCMASK + ? CmpNode1 + : CmpNode2; + int CmpCCValid = CCValid, SelectCCValid = CCValid; + int CmpCCMask = CCMask, SelectCCMask = CCMask; + bool IsOp1 = combineCCMask(CmpOp, CmpCCValid, CmpCCMask); This calls combineCCMask on some random operation that does not set a condition code - is this why you end up in some of those cases above? That doesn't make sense. What is the actual optimization this code path is supposed to achieve?

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

}

+ int CmpVal = RHS->getZExtValue(); + // (BR_CC (ICMP (SELECT_CCMASK (CC)))) + if (LHS->getOpcode() == SystemZISD::SELECT_CCMASK) { + int CCVal = RHS->getZExtValue(); + int Mask = CCMaskForICmpEQCCVal(CCVal); + bool Invert = false; + if (CCMask == SystemZ::CCMASK_CMP_NE) + Invert = !Invert; + SDValue NewCCReg = CCNode->getOperand(0); + if (combineCCMask(NewCCReg, CCValid, CCMask)) { Again a recursive call with an opcode that does not set CC.

________________________________ In llvm/lib/Target/SystemZ/SystemZISelLowering.cpp<#125970 (comment) >:

+ if (LHS->getOpcode() == SystemZISD::SELECT_CCMASK) {

+ int CCVal = RHS->getZExtValue(); + int Mask = CCMaskForICmpEQCCVal(CCVal); + bool Invert = false; + if (CCMask == SystemZ::CCMASK_CMP_NE) + Invert = !Invert; + SDValue NewCCReg = CCNode->getOperand(0); + if (combineCCMask(NewCCReg, CCValid, CCMask)) { + CCMask |= Mask; + if (Invert) + CCMask ^= SystemZ::CCMASK_ANY; + CCReg = NewCCReg; + CCValid = SystemZ::CCMASK_ANY; + return true; + } else if (CCMask == SystemZ::CCMASK_CMP_NE || + CCMask != SystemZ::CCMASK_CMP_EQ) { This condition looks incorrect. — Reply to this email directly, view it on GitHub<#125970 (review) >, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BM5K4GTSFUQ33NTN6K4DE4D23JBJZAVCNFSM6AAAAABWSJVEQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOOJUGI2DQMBUHE >. You are receiving this because you authored the thread.Message ID: ***@***.***> @@ -8797,14 +8797,21 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { bool Invert = false; if (CCMask == SystemZ::CCMASK_TM_SOME_1) Invert = !Invert; + else if (CCMask != SystemZ::CCMASK_TM_ALL_0) + return false; auto *N = CCNode->getOperand(0).getNode(); - auto Shift = dyn_cast<ConstantSDNode>(CCNode->getOperand(1)); - if (!N || !Shift) + auto *TMOp1Const = dyn_cast<ConstantSDNode>(CCNode->getOperand(1)); + auto *TMOp2Const = dyn_cast<ConstantSDNode>(CCNode->getOperand(2)); + if (!N || !TMOp1Const || !TMOp2Const || TMOp2Const->getZExtValue() != 0) return false; + auto TMConstVal = TMOp1Const->getZExtValue(); if (N->getOpcode() == SystemZISD::IPM) { - auto ShiftVal = Shift->getZExtValue(); - if (ShiftVal == (1 << SystemZ::IPM_CC)) + if (TMConstVal == (1 << SystemZ::IPM_CC)) CCMask = SystemZ::CCMASK_CMP_GE; + else if (TMConstVal == (1 << (SystemZ::IPM_CC + 1))) + CCMask = SystemZ::CCMASK_CMP_LE; + else + return false; if (Invert) CCMask ^= CCValid; // Return the updated CCReg link. @@ -8818,8 +8825,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { return false; // OP1. (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). // OP2. (SRL (IPM (CC))). - if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK /*&& - isSRL_IPM_CCSequence(XOROp2)*/) { + if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK) { auto *CCValid1 = dyn_cast<ConstantSDNode>(XOROp1->getOperand(2)); auto *CCMask1 = dyn_cast<ConstantSDNode>(XOROp1->getOperand(3)); SDValue XORReg = XOROp1->getOperand(4); @@ -8827,8 +8833,12 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { return false; int CCValidVal = CCValid1->getZExtValue(); int CCMaskVal = CCMask1->getZExtValue(); - if (combineCCMask(XORReg, CCValidVal, CCMaskVal)) { + // (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). + if (combineCCMask(XORReg, CCValidVal, CCMaskVal) && + isSRL_IPM_CCSequence(XOROp2)) { // CC == 0 || CC == 2 for bit 28 Test Under Mask. + if (TMConstVal != 1) + return false; CCMask = SystemZ::CCMASK_CMP_GE; CCMask ^= CCMaskVal; if (Invert) @@ -8840,6 +8850,14 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { } } // Optimize (AND (SRL (IPM (CC)))). + // One use case it is being called from combineSELECT_CC_CCIPMMASK where + // subtree select_cc operand has already been computed and other operand + // and-exp has to evaluated. combineSELECT_CC_CCIPMMASK calls combineCCMask + // for and-exp. This is also one of very few cases where ICMP has both + // operands non-const. Below has ICMP code where already-computed-select_cc + // and and-exp are compared. + // (BR_CCMASK (ICMP (already-combined_computed-select_cc_mask and-exp))) + // and-exp - (AND (SRL (IPM (CC)))). if (CCNode->getOpcode() == ISD::AND) { auto *N = CCNode->getOperand(0).getNode(); if (!isSRL_IPM_CCSequence(N)) @@ -8848,9 +8866,9 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { if (!ANDConst) return false; // Bit 28 false (CC == 0) || (CC == 2). - // Caller can invert it depending on CCmask there. + // Caller can invert it depending on CCMask there. if (ANDConst->getZExtValue() == 1) { - CCMask = SystemZ::CCMASK_0 | SystemZ::CCMASK_2; + CCMask = SystemZ::CCMASK_CMP_GE; CCValid = SystemZ::CCMASK_ANY; return true; } @@ -8866,6 +8884,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { int CCValidVal = CCValidNode->getZExtValue(); int CCMaskVal = CCMaskNode->getZExtValue(); SDValue CCRegOp = CCNode->getOperand(4); + // (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). if (combineCCMask(CCRegOp, CCValidVal, CCMaskVal)) { CCMask = CCMaskVal; CCValid = SystemZ::CCMASK_ANY; @@ -8899,7 +8918,9 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { int CCMaskVal2 = CCMask2->getZExtValue(); SDValue CCReg1 = XOROp1->getOperand(4); SDValue CCReg2 = XOROp2->getOperand(4); + // (ICMP (SRL (IPM (CC)))). if (!combineCCMask(CCReg1, CCValidVal1, CCMaskVal1) || + // (ICMP (SRL (IPM (CC)))). !combineCCMask(CCReg2, CCValidVal2, CCMaskVal2)) return false; CCMask = CCMaskVal1 ^ CCMaskVal2; @@ -8919,8 +8940,10 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { if (!LHS || LHS->getOpcode() == ISD::Constant) return false; - // (BR_CC (ICMP (Op1 Op2))), Op1 Op2 will have (SRL (IPM (CC))) sequence. - // SystemZ::ICMP second operand is not constant. + // (BR_CC (ICMP (Op1 Op2))), SystemZ::ICMP has both operands Op1 and Op2 + // non-const. One use case: + // (BR_CCMASK (ICMP (SELECT_CCMASK (ICMP (SRL (IPM CC)))) + // already-combined_computed-select_cc_mask))) if (!RHS) { SDValue CmpOp1 = CCNode->getOperand(0); SDValue CmpOp2 = CCNode->getOperand(1); @@ -8936,7 +8959,10 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { : CmpNode2; int CmpCCValid = CCValid, SelectCCValid = CCValid; int CmpCCMask = CCMask, SelectCCMask = CCMask; + // combine (SELECT_CCMASK (ICMP (SRL (IPM CC)))) bool IsOp1 = combineCCMask(CmpOp, CmpCCValid, CmpCCMask); + // subtree SELECT_CCMASK is already combined with CC, has CCMASK already + // been computed. Just ceck ISOp1 and IsOp2 refer to same CC. bool IsOp2 = isSameCCIPMOp(CmpOp, SelectCC, SelectCCValid, SelectCCMask); if (IsOp1 && IsOp2) { CCMask = CmpCCMask ^ SelectCCMask; @@ -8948,7 +8974,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { return false; } int CmpVal = RHS->getZExtValue(); - // (BR_CC (ICMP (SELECT_CCMASK (CC)))) + // (BR_CC (ICMP (SELECT_CCMASK (ICMP (SRL (IPM CC)))))) if (LHS->getOpcode() == SystemZISD::SELECT_CCMASK) { int CCVal = RHS->getZExtValue(); int Mask = CCMaskForICmpEQCCVal(CCVal); @@ -8956,6 +8982,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { if (CCMask == SystemZ::CCMASK_CMP_NE) Invert = !Invert; SDValue NewCCReg = CCNode->getOperand(0); + // (SELECT_CCMASK (ICMP (SRL (IPM CC)))) if (combineCCMask(NewCCReg, CCValid, CCMask)) { CCMask |= Mask; if (Invert) @@ -8964,8 +8991,8 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { CCValid = SystemZ::CCMASK_ANY; return true; } else if (CCMask == SystemZ::CCMASK_CMP_NE || - CCMask != SystemZ::CCMASK_CMP_EQ) { - // Original combineCCMask. + CCMask == SystemZ::CCMASK_CMP_EQ) { + // Original combineCCMask code before flag output operand. // Verify that the ICMP compares against one of select values. auto *TrueVal = dyn_cast<ConstantSDNode>(LHS->getOperand(0)); if (!TrueVal) @@ -8994,7 +9021,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { } return false; } - // (BR_CC (ICMP OR ((SRL (IPM (CC))) (SELECT_CCMASK (CC))))) + // (BR_CC (ICMP (OR (Op1 Op2)))). if (LHS->getOpcode() == ISD::OR) { bool Invert = false; if (CCMask == SystemZ::CCMASK_CMP_NE) @@ -9009,6 +9036,8 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { if (!IsOp1 && !IsOp2) { return false; } + // Both Op1 and Op2 are non-const. + // Op1 and Op2 can be any of the pattern combined in combineCCMask. if (IsOp1 && IsOp2) { NewCCMask = NewCCMask1 | NewCCMask2; bool IsEqualCmpVal = NewCCMask == CmpVal; @@ -9021,9 +9050,11 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { return true; } } else if (isa<ConstantSDNode>(OrOp2)) { + // Op1 is const. Op2 is (SRL (IPM (CC)). if (isSRL_IPM_CCSequence(OrOp1.getNode())) { auto *OrConst = dyn_cast<ConstantSDNode>(OrOp2); int OrConstVal = OrConst->getZExtValue(); + // %2 = or disjoint i32 %0, -4. if (!OrConst || (OrConstVal & 0x3)) return false; // setullt unsigned(-2), mask = 0x1100 @@ -9037,7 +9068,7 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { } return false; } - // (BR_CC (ICMP AND ((SRL (IPM (CC))) (SELECT_CCMASK (CC))))) + // (BR_CC (ICMP AND (Op1 Op2) if (LHS->getOpcode() == ISD::AND) { bool Invert = false; if (CCMask == SystemZ::CCMASK_CMP_NE) @@ -9047,10 +9078,18 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { int NewCCMask1 = CCMask, NewCCMask2 = CCMask, NewCCMask; int CCValid1 = CCValid, CCValid2 = CCValid; if (!isa<ConstantSDNode>(AndOp1) && !isa<ConstantSDNode>(AndOp2)) { + // (SRL (IPM (CC))). bool IsOp1 = combineCCMask(AndOp1, CCValid1, NewCCMask1); + // (SELECT_CCMASK (ICMP (SRL (IPM CC)))). bool IsOp2 = combineCCMask(AndOp2, CCValid2, NewCCMask2); + // Both Op1 and Op2 are const. if (!IsOp1 && !IsOp2) return false; + // Op1 and Op2 can be any of the pattern combined in combineCCMask. + // e.g. %2 = or i1 %cmp, %cmp2, %2 = or i1 %xor8, %cmp4, t28, + // i32 = or t27, t26 or (%2 = or i1 %or.cond, %cmp3, + // %cmp3 = icmp eq i32 %asmresult1, + // %or.cond = icmp samesign ult i32 %asmresult1, 2) sequence. if (IsOp1 && IsOp2) { NewCCMask = NewCCMask1 & NewCCMask2; bool IsEqualCmpVal = NewCCMask == CmpVal; @@ -9062,6 +9101,9 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { CCValid = SystemZ::CCMASK_ANY; return true; } else { + // Either Op1 or Op2 is (SRL (IPM (CC))) sequence. + // and other Op can be one of any of pattern to be combined + // mentioned in some examples above. if (IsOp1 && isSRL_IPM_CCSequence(AndOp2.getNode())) NewCCMask = NewCCMask1; else if (isSRL_IPM_CCSequence(AndOp2.getNode()) && IsOp2) @@ -9133,7 +9175,11 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { bool Invert = false; if (CCMask == SystemZ::CCMASK_CMP_NE) Invert = !Invert; - // If both the operands are select_cc. + // If both the operands of XOR are + // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). + // It will get ccombined in recursion base case both operands are xor. + // t32: i32 = xor t44, t46 where t44 and t46 are select_cc as + // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). if (combineCCMask(XORReg, CCValid, CCMask)) { CCReg = XORReg; CCValid = SystemZ::CCMASK_ANY; @@ -9141,6 +9187,9 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { } // Handle the case when one of the operand is select_cc and other operand // could be xor again having both operands as select_cc. + // t32: i32 = xor t44, t46, where t44 and t46 are select_cc + // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). + // t34: i32 = xor t49, t32, where t49 is select_cc and t32 xor base case. auto *XOROp1 = LHS->getOperand(0).getNode(); auto *XOROp2 = LHS->getOperand(1).getNode(); if (!XOROp1 || !XOROp2) @@ -9158,7 +9207,10 @@ static bool combineCCMask(SDValue &CCReg, int &CCValid, int &CCMask) { SDValue XORReg1 = XOROp->getOperand(4); SDValue XORReg2 = LHS->getOperand(1); int CCMaskVal1 = CCMaskVal, CCMaskVal2 = CCMaskVal; + // (ICMP (SRL (IPM (CC)))). if (combineCCMask(XORReg1, CCValidVal, CCMaskVal1) && + // XOR base case t32: (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC))))) + // (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). combineCCMask(XORReg2, CCValidVal, CCMaskVal2)) { CCMask = CCMaskVal1 ^ CCMaskVal2; // Optimize the case where LHS is (ICMP (SRL (IPM))). if (isSRL_IPM_CCSequence(LHS)) { unsigned CCVal = RHS->getZExtValue(); if (convertCCValToCCMask(CCVal)) { CCValid = SystemZ::CCMASK_ANY; return true; } return false; } // Both oerands of XOR are (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). // t32: i32 = xor t44, t46, where t44 and t46 are select_cc if (CCNode->getOpcode() == ISD::XOR) { if (isa<ConstantSDNode>(CCNode->getOperand(0)) || isa<ConstantSDNode>(CCNode->getOperand(1))) return false; auto *XOROp1 = CCNode->getOperand(0).getNode(); auto *XOROp2 = CCNode->getOperand(1).getNode(); if (!XOROp1 || !XOROp2) return false; // Both Operands are select_cc. if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK && XOROp2->getOpcode() == SystemZISD::SELECT_CCMASK) { auto *CCValid1 = dyn_cast<ConstantSDNode>(XOROp1->getOperand(2)); auto *CCMask1 = dyn_cast<ConstantSDNode>(XOROp1->getOperand(3)); auto *CCValid2 = dyn_cast<ConstantSDNode>(XOROp2->getOperand(2)); auto *CCMask2 = dyn_cast<ConstantSDNode>(XOROp2->getOperand(3)); if (!CCValid1 || !CCMask1 || !CCValid2 || !CCMask2) return false; int CCValidVal1 = CCValid1->getZExtValue(); int CCMaskVal1 = CCMask1->getZExtValue(); int CCValidVal2 = CCValid2->getZExtValue(); int CCMaskVal2 = CCMask2->getZExtValue(); SDValue CCReg1 = XOROp1->getOperand(4); SDValue CCReg2 = XOROp2->getOperand(4); // (ICMP (SRL (IPM (CC)))). if (!combineCCMask(CCReg1, CCValidVal1, CCMaskVal1) || // (ICMP (SRL (IPM (CC)))). !combineCCMask(CCReg2, CCValidVal2, CCMaskVal2)) return false; CCMask = CCMaskVal1 ^ CCMaskVal2; CCReg = CCReg1; CCValid = SystemZ::CCMASK_ANY; return true; } return false; } // Optimize (ICMP (XOR (OP1 OP2))), OP1 or OP2 could be XOR again. // One or both of operands could be (SELECT_CCMASK (ICMP (SRL (IPM (CC))))). if (LHS->getOpcode() == ISD::XOR) { SDValue XORReg = CCReg->getOperand(0); bool Invert = false; if (CCMask == SystemZ::CCMASK_CMP_NE) Invert = !Invert; // If both the operands of XOR are // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). // It will get ccombined in recursion base case both operands are xor. // t32: i32 = xor t44, t46 where t44 and t46 are select_cc as // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). if (combineCCMask(XORReg, CCValid, CCMask)) { // will be combined in XOR code above. CCReg = XORReg; CCValid = SystemZ::CCMASK_ANY; return true; } // Handle the case when one of the operand is select_cc and other operand // could be xor again having both operands as select_cc. // t32: i32 = xor t44, t46, where t44 and t46 are select_cc // (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). // t34: i32 = xor t49, t32, where t49 is select_cc and t32 xor base case. auto *XOROp1 = LHS->getOperand(0).getNode(); auto *XOROp2 = LHS->getOperand(1).getNode(); if (!XOROp1 || !XOROp2) return false; if (XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK || XOROp2->getOpcode() == SystemZISD::SELECT_CCMASK) { auto *XOROp = XOROp1->getOpcode() == SystemZISD::SELECT_CCMASK ? XOROp1 : XOROp2; auto *CCMaskNode = dyn_cast<ConstantSDNode>(XOROp->getOperand(3)); auto *CCValidNode = dyn_cast<ConstantSDNode>(XOROp->getOperand(2)); if (!CCValidNode || !CCMaskNode) return false; int CCValidVal = CCValidNode->getZExtValue(); int CCMaskVal = CCMaskNode->getZExtValue(); SDValue XORReg1 = XOROp->getOperand(4); SDValue XORReg2 = LHS->getOperand(1); int CCMaskVal1 = CCMaskVal, CCMaskVal2 = CCMaskVal; // (ICMP (SRL (IPM (CC)))). if (combineCCMask(XORReg1, CCValidVal, CCMaskVal1) && // XOR base case t32: (XOR (SELECT_CCMASK (ICMP (SRL (IPM (CC))))) // (SELECT_CCMASK (ICMP (SRL (IPM (CC)))))). combineCCMask(XORReg2, CCValidVal, CCMaskVal2)) { CCMask = CCMaskVal1 ^ CCMaskVal2; CCReg = XORReg1; CCValid = SystemZ::CCMASK_ANY; return true; } } }

Add support for flag output operand "=@cc" for SystemZ and optimizing…

8a07b3d

… conditional branch for 14 possible combinations of CC mask.

llvmbot added clang Clang issues not falling into any other category backend:SystemZ clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:SelectionDAG SelectionDAGISel as well labels Feb 6, 2025

arsenm reviewed Feb 6, 2025

View reviewed changes

anoopkg6 and others added 6 commits February 7, 2025 01:29

Removed triple check in CGStmt.cpp and using StringRef in SystemZ.h.

d3f3c03

clang test causing pr build failure.

062e03a

Add Preprocessor test for flag output operand and some cleanup for c …

5aef564

…string.

Merge branch 'main' into asm_llvm

f10e46b

Fixed clang outputting extra byte '\7F' in clang test and fix a warni…

2ab7a75

…ng in SystemZISelLowering.cpp.

Merge branch 'asm_llvm' of github.com:anoopkg6/llvm-project into asm_…

132cf31

…llvm

arsenm reviewed Feb 12, 2025

View reviewed changes

uweigand reviewed Feb 12, 2025

View reviewed changes

Incorporated suggestions in review.

fccc70e

uweigand reviewed Feb 14, 2025

View reviewed changes

llvmbot added the backend:AArch64 label Feb 25, 2025

uweigand reviewed Mar 14, 2025

View reviewed changes

anoopkg6 and others added 2 commits April 22, 2025 00:03

Incorporated changes for code review feedback.

d787c8b

Merge branch 'main' into asm_llvm

82a26f7

uweigand reviewed Apr 25, 2025

View reviewed changes

Add support for flag output operand "=@cc" for SystemZ. #125970

Are you sure you want to change the base?

Add support for flag output operand "=@cc" for SystemZ. #125970

Uh oh!

Conversation

anoopkg6 commented Feb 6, 2025

Uh oh!

llvmbot commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 6, 2025

Uh oh!

llvmbot commented Feb 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uweigand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Feb 6, 2025 •

edited

Loading