[SPIRV][RFC] Rework / extend support for memory scopes #106429

AlexVlx · 2024-08-28T18:21:12Z

This change adds support for correctly lowering the __scoped Clang builtins, and corresponding scoped LLVM instructions. These were previously unconditionally lowered to Device scope, which is can be too conservative and possibly incorrect. Furthermore, the default / implicit scope is changed from Device (an OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not OpenCL specific / can ingest IR coming from other language front-ends. OpenCL defaulting to Device scope is now reflected in the front-end handling of atomic ops, which seems preferable.

… passed to kernels / functions.

…cnspirv

llvmbot · 2024-08-28T18:21:55Z

@llvm/pr-subscribers-backend-spir-v

Author: Alex Voicu (AlexVlx)

Changes

This change adds support for correctly lowering the __scoped Clang builtins, and corresponding scoped LLVM instructions. These were previously unconditionally lowered to Device scope, which is can be too conservative and possibly incorrect. Furthermore, the default / implicit scope is changed from Device (an OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not OpenCL specific / can ingest IR coming from other language front-ends. OpenCL defaulting to Device scope is now reflected in the front-end handling of atomic ops, which seems preferable.

Patch is 90.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106429.diff

23 Files Affected:

(modified) clang/lib/Basic/Targets/SPIR.h (+6)
(modified) clang/lib/CodeGen/CGAtomic.cpp (+10-1)
(modified) clang/lib/CodeGen/Targets/SPIR.cpp (+40)
(modified) clang/test/CodeGen/scoped-atomic-ops.c (+223-113)
(modified) clang/test/Sema/scoped-atomic-ops.c (+1)
(modified) llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp (+2-1)
(modified) llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp (+11-28)
(modified) llvm/lib/Target/SPIRV/SPIRVUtils.cpp (+18)
(modified) llvm/lib/Target/SPIRV/SPIRVUtils.h (+2)
(modified) llvm/test/CodeGen/SPIRV/AtomicCompareExchange.ll (+3-3)
(modified) llvm/test/CodeGen/SPIRV/atomicrmw.ll (+13-13)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_add/atomicrmw_faddfsub_double.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_add/atomicrmw_faddfsub_float.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_add/atomicrmw_faddfsub_half.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_min_max/atomicrmw_fminfmax_double.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_min_max/atomicrmw_fminfmax_float.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/extensions/SPV_EXT_shader_atomic_float_min_max/atomicrmw_fminfmax_half.ll (+4-3)
(modified) llvm/test/CodeGen/SPIRV/fence.ll (+5-5)
(modified) llvm/test/CodeGen/SPIRV/instructions/atomic-ptr.ll (+1-1)
(modified) llvm/test/CodeGen/SPIRV/instructions/atomic.ll (+17-16)
(modified) llvm/test/CodeGen/SPIRV/instructions/atomic_acqrel.ll (+2-2)
(modified) llvm/test/CodeGen/SPIRV/instructions/atomic_seq.ll (+2-2)
(added) llvm/test/CodeGen/SPIRV/scoped_atomicrmw.ll (+163)

diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 37cf9d7921bac5..8a26db7971cba6 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -335,6 +335,9 @@ class LLVM_LIBRARY_VISIBILITY SPIRV32TargetInfo : public BaseSPIRVTargetInfo {
     PointerWidth = PointerAlign = 32;
     SizeType = TargetInfo::UnsignedInt;
     PtrDiffType = IntPtrType = TargetInfo::SignedInt;
+    // SPIR-V has core support for atomic ops, and Int32 is always available;
+    // we take the maximum because it's possible the Host supports wider types.
+    MaxAtomicInlineWidth = std::max<unsigned char>(MaxAtomicInlineWidth, 32);
     resetDataLayout("e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-"
                     "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1");
   }
@@ -356,6 +359,9 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64TargetInfo : public BaseSPIRVTargetInfo {
     PointerWidth = PointerAlign = 64;
     SizeType = TargetInfo::UnsignedLong;
     PtrDiffType = IntPtrType = TargetInfo::SignedLong;
+    // SPIR-V has core support for atomic ops, and Int64 is always available;
+    // we take the maximum because it's possible the Host supports wider types.
+    MaxAtomicInlineWidth = std::max<unsigned char>(MaxAtomicInlineWidth, 64);
     resetDataLayout("e-i64:64-v16:16-v24:32-v32:32-v48:64-"
                     "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1");
   }
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index fbe9569e50ef63..ba6ee4c0be3b7f 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -766,8 +766,17 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
   // LLVM atomic instructions always have synch scope. If clang atomic
   // expression has no scope operand, use default LLVM synch scope.
   if (!ScopeModel) {
+    llvm::SyncScope::ID SS = CGF.getLLVMContext().getOrInsertSyncScopeID("");
+    if (CGF.getLangOpts().OpenCL)
+      // OpenCL approach is: "The functions that do not have memory_scope argument
+      // have the same semantics as the corresponding functions with the
+      // memory_scope argument set to memory_scope_device." See ref.: //
+      // https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions
+      SS = CGF.getTargetHooks().getLLVMSyncScopeID(CGF.getLangOpts(),
+                                                   SyncScope::OpenCLDevice,
+                                                   Order, CGF.getLLVMContext());
     EmitAtomicOp(CGF, Expr, Dest, Ptr, Val1, Val2, IsWeak, FailureOrder, Size,
-                 Order, CGF.CGM.getLLVMContext().getOrInsertSyncScopeID(""));
+                 Order, SS);
     return;
   }
 
diff --git a/clang/lib/CodeGen/Targets/SPIR.cpp b/clang/lib/CodeGen/Targets/SPIR.cpp
index cc52925e2e523f..a90741c0c0d324 100644
--- a/clang/lib/CodeGen/Targets/SPIR.cpp
+++ b/clang/lib/CodeGen/Targets/SPIR.cpp
@@ -58,6 +58,10 @@ class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo {
   SPIRVTargetCodeGenInfo(CodeGen::CodeGenTypes &CGT)
       : CommonSPIRTargetCodeGenInfo(std::make_unique<SPIRVABIInfo>(CGT)) {}
   void setCUDAKernelCallingConvention(const FunctionType *&FT) const override;
+  llvm::SyncScope::ID getLLVMSyncScopeID(const LangOptions &LangOpts,
+                                         SyncScope Scope,
+                                         llvm::AtomicOrdering Ordering,
+                                         llvm::LLVMContext &Ctx) const override;
 };
 } // End anonymous namespace.
 
@@ -188,6 +192,42 @@ void SPIRVTargetCodeGenInfo::setCUDAKernelCallingConvention(
   }
 }
 
+llvm::SyncScope::ID
+SPIRVTargetCodeGenInfo::getLLVMSyncScopeID(const LangOptions &,
+                                           SyncScope Scope,
+                                           llvm::AtomicOrdering,
+                                           llvm::LLVMContext &Ctx) const {
+  std::string Name;
+  switch (Scope) {
+  case SyncScope::HIPSingleThread:
+  case SyncScope::SingleScope:
+    Name = "singlethread";
+    break;
+  case SyncScope::HIPWavefront:
+  case SyncScope::OpenCLSubGroup:
+  case SyncScope::WavefrontScope:
+    Name = "subgroup";
+    break;
+  case SyncScope::HIPWorkgroup:
+  case SyncScope::OpenCLWorkGroup:
+  case SyncScope::WorkgroupScope:
+    Name = "workgroup";
+    break;
+  case SyncScope::HIPAgent:
+  case SyncScope::OpenCLDevice:
+  case SyncScope::DeviceScope:
+    Name = "device";
+    break;
+  case SyncScope::SystemScope:
+  case SyncScope::HIPSystem:
+  case SyncScope::OpenCLAllSVMDevices:
+    Name = "all_svm_devices";
+    break;
+  }
+
+  return Ctx.getOrInsertSyncScopeID(Name);
+}
+
 /// Construct a SPIR-V target extension type for the given OpenCL image type.
 static llvm::Type *getSPIRVImageType(llvm::LLVMContext &Ctx, StringRef BaseType,
                                      StringRef OpenCLName,
diff --git a/clang/test/CodeGen/scoped-atomic-ops.c b/clang/test/CodeGen/scoped-atomic-ops.c
index b0032046639b89..24f1613e8af4e8 100644
--- a/clang/test/CodeGen/scoped-atomic-ops.c
+++ b/clang/test/CodeGen/scoped-atomic-ops.c
@@ -1,12 +1,21 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
 // RUN: %clang_cc1 %s -emit-llvm -o - -triple=amdgcn-amd-amdhsa -ffreestanding \
-// RUN:   -fvisibility=hidden | FileCheck %s
+// RUN:   -fvisibility=hidden | FileCheck --check-prefix=AMDGCN %s
+// RUN: %clang_cc1 %s -emit-llvm -o - -triple=spirv64-unknown-unknown -ffreestanding \
+// RUN:   -fvisibility=hidden | FileCheck --check-prefix=SPIRV %s
 
-// CHECK-LABEL: define hidden i32 @fi1a(
-// CHECK:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup-one-as") monotonic, align 4
-// CHECK:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("wavefront-one-as") monotonic, align 4
-// CHECK:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-LABEL: define hidden i32 @fi1a(
+// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup-one-as") monotonic, align 4
+// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("wavefront-one-as") monotonic, align 4
+// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread-one-as") monotonic, align 4
+// SPIRV: define hidden spir_func i32 @fi1a(
+// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:.+]] syncscope("workgroup") monotonic, align 4
+// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:.+]] syncscope("subgroup") monotonic, align 4
+// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:.+]] syncscope("singlethread") monotonic, align 4
 int fi1a(int *i) {
   int v;
   __scoped_atomic_load(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -17,13 +26,18 @@ int fi1a(int *i) {
   return v;
 }
 
-// CHECK-LABEL: define hidden i32 @fi1b(
-// CHECK:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// CHECK:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// CHECK:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
-//
+// AMDGCN-LABEL: define hidden i32 @fi1b(
+// AMDGCN:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
+// AMDGCN:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
+// AMDGCN:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// SPIRV-LABEL: define hidden spir_func i32 @fi1b(
+// SPIRV:    [[TMP0:%.*]] = load atomic i32, ptr [[PTR0:%.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP1:%.*]] = load atomic i32, ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP2:%.*]] = load atomic i32, ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
+// SPIRV:    [[TMP3:%.*]] = load atomic i32, ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
+// SPIRV:    [[TMP4:%.*]] = load atomic i32, ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
 int fi1b(int *i) {
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
   *i = __scoped_atomic_load_n(i, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -33,13 +47,18 @@ int fi1b(int *i) {
   return *i;
 }
 
-// CHECK-LABEL: define hidden void @fi2a(
-// CHECK:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
-//
+// AMDGCN-LABEL: define hidden void @fi2a(
+// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// SPIRV-LABEL: define hidden spir_func void @fi2a(
+// SPIRV:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
 void fi2a(int *i) {
   int v = 1;
   __scoped_atomic_store(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -49,12 +68,18 @@ void fi2a(int *i) {
   __scoped_atomic_store(i, &v, __ATOMIC_RELAXED, __MEMORY_SCOPE_SINGLE);
 }
 
-// CHECK-LABEL: define hidden void @fi2b(
-// CHECK:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
-// CHECK:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// AMDGCN-LABEL: define hidden void @fi2b(
+// AMDGCN:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("wavefront-one-as") monotonic, align 4
+// AMDGCN:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread-one-as") monotonic, align 4
+// SPIRV-LABEL: define hidden spir_func void @fi2b(
+// SPIRV:    store atomic i32 [[TMP0:%.+]], ptr [[PTR0:%.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP1:%.+]], ptr [[PTR1:%.+]] syncscope("device") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP2:%.+]], ptr [[PTR2:%.+]] syncscope("workgroup") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP3:%.+]], ptr [[PTR3:%.+]] syncscope("subgroup") monotonic, align 4
+// SPIRV:    store atomic i32 [[TMP4:%.+]], ptr [[PTR4:%.+]] syncscope("singlethread") monotonic, align 4
 void fi2b(int *i) {
   __scoped_atomic_store_n(i, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
   __scoped_atomic_store_n(i, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -63,15 +88,24 @@ void fi2b(int *i) {
   __scoped_atomic_store_n(i, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_SINGLE);
 }
 
-// CHECK-LABEL: define hidden void @fi3a(
-// CHECK:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("one-as") monotonic, align 4
-// CHECK:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN-LABEL: define hidden void @fi3a(
+// AMDGCN:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("one-as") monotonic, align 4
+// AMDGCN:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("one-as") monotonic, align 4
+// SPIRV-LABEL: define hidden spir_func void @fi3a(
+// SPIRV:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("all_svm_devices") monotonic, align 4
+// SPIRV:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("all_svm_devices") monotonic, align 4
 void fi3a(int *a, int *b, int *c, int *d, int *e, int *f, int *g, int *h) {
   *a = __scoped_atomic_fetch_add(a, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
   *b = __scoped_atomic_fetch_sub(b, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
@@ -83,15 +117,24 @@ void fi3a(int *a, int *b, int *c, int *d, int *e, int *f, int *g, int *h) {
   *h = __scoped_atomic_fetch_max(h, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_SYSTEM);
 }
 
-// CHECK-LABEL: define hidden void @fi3b(
-// CHECK:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("agent-one-as") monotonic, align 4
-// CHECK:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN-LABEL: define hidden void @fi3b(
+// AMDGCN:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("agent-one-as") monotonic, align 4
+// AMDGCN:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("agent-one-as") monotonic, align 4
+// SPIRV-LABEL: define hidden spir_func void @fi3b(
+// SPIRV:    [[TMP0:%.*]] = atomicrmw add ptr [[PTR0:%.+]], i32 [[VAL0:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP1:%.*]] = atomicrmw sub ptr [[PTR1:%.+]], i32 [[VAL1:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP2:%.*]] = atomicrmw and ptr [[PTR2:%.+]], i32 [[VAL2:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP3:%.*]] = atomicrmw or ptr [[PTR3:%.+]], i32 [[VAL3:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP4:%.*]] = atomicrmw xor ptr [[PTR4:%.+]], i32 [[VAL4:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP5:%.*]] = atomicrmw nand ptr [[PTR5:%.+]], i32 [[VAL5:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP6:%.*]] = atomicrmw min ptr [[PTR6:%.+]], i32 [[VAL6:.+]] syncscope("device") monotonic, align 4
+// SPIRV:    [[TMP7:%.*]] = atomicrmw max ptr [[PTR7:%.+]], i32 [[VAL7:.+]] syncscope("device") monotonic, align 4
 void fi3b(int *a, int *b, int *c, int *d, int *e, int *f, int *g, int *h) {
   *a = __scoped_atomic_fetch_add(a, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
   *b = __scoped_atomic_fetch_sub(b, 1, __ATOMIC_RELAXED, __MEMORY_SCOPE_DEVICE);
@@ -103,15 +146,24 @@ void fi3b(int *a, int *b, int *c, int *d, int *e, int *f, int *g, int *h) {
   *h = __scoped_atomic_fetch_max(h, 1, __ATOMIC_RELAXED, __MEMORY_S...
[truncated]

github-actions · 2024-08-28T18:25:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…cnspirv

arsenm · 2024-08-29T04:44:21Z

clang/lib/Basic/Targets/SPIR.h

@@ -335,6 +335,9 @@ class LLVM_LIBRARY_VISIBILITY SPIRV32TargetInfo : public BaseSPIRVTargetInfo {
    PointerWidth = PointerAlign = 32;
    SizeType = TargetInfo::UnsignedInt;
    PtrDiffType = IntPtrType = TargetInfo::SignedInt;
+    // SPIR-V has core support for atomic ops, and Int32 is always available;
+    // we take the maximum because it's possible the Host supports wider types.
+    MaxAtomicInlineWidth = std::max<unsigned char>(MaxAtomicInlineWidth, 32);


Isn't there a 64-bit atomic extension? How are extensions supposed to work here?

I'm assuming that the SPIRV32 target exists for cases where the Int64 capability is never enabled, but it would probably be useful to have that assumption checked. For SPIR-V the model for extensions / capabilities in LLVM seems to be push i.e. extensions get enabled / checked iff a feature requiring the extension / capability is encountered when translating (legacy) / lowering (the experimental BE). FWIW, my reading of the SPIR-V spec is that the Int64 capability is core.

clang/lib/CodeGen/CGAtomic.cpp

clang/lib/CodeGen/Targets/SPIR.cpp

arsenm · 2024-08-29T04:47:17Z

clang/lib/CodeGen/CGAtomic.cpp

@@ -766,8 +766,17 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
  // LLVM atomic instructions always have synch scope. If clang atomic
  // expression has no scope operand, use default LLVM synch scope.
  if (!ScopeModel) {
+    llvm::SyncScope::ID SS = CGF.getLLVMContext().getOrInsertSyncScopeID("");
+    if (CGF.getLangOpts().OpenCL)
+      // OpenCL approach is: "The functions that do not have memory_scope


Which flavor of atomic operations does this function correspond to?

This is the primary entry point for Atomic emission, so things like the Clang builtins (which do not carry scopes) would end up here.

…cnspirv

VyacheslavLevytskyy · 2024-09-06T08:47:06Z

Thank you for the PR! I'd like to better understand motivation and justification of SPIR-V BE-related changes though. The goal would be to understand whether AllSvmDevices is indeed a better choice (for whom?) than Device as a default mem scope value in SPIR-V BE.

Questions to the description of the PR.

"These were previously unconditionally lowered to Device scope, which is can be too conservative and possibly incorrect."

The claim is not justified by any docs/specs. Why Device scope is incorrect as a default? In my opinion, it's AllSvmDevices that looks like a conservative choice that may lead to performance degradation in general case when we change the default without notifying customers. Or, we may say that potential performance changes may depend on a vendor-specific behavior in this case.

"Furthermore, the default / implicit scope is changed from Device (an OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not OpenCL specific / can ingest IR coming from other language front-ends."

What I know without additional references to other docs/specs is that Device is default by OpenCL spec (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions). It would help if you can provide references where AllSvmDevices is a preferable choice, so that we are able to compare and figure out the best default for the Computational flavor of SPIR-V. For sure, SPIR-V BE is not OpenCL (=Device) specific, and it's also not specific to any particular vendor or computational framework. I've seen usages of AllSvmDevices as default in the code base (for example, in

llvm-project/clang/lib/CodeGen/Targets/AMDGPU.cpp

Line 537 in 319e8cd

Name = "";

), but it seems not enough to flip the default over.

"OpenCL defaulting to Device scope is now reflected in the front-end handling of atomic ops, which seems preferable."

Changes in clang part looks really good to me. However, when we add to it changes in SPIR-V part of the code base, things look less optimistic, because what this PR means by "the front-end handling of atomic ops" is the upstream clang only, whereas actual choices of a front-end are more versatile, and users coming to SPIR-V by other paths would get a sudden change of behavior in the worst case (e.g., MLIR input for the GenAI domain).

===

If it's acceptable to split this PR into two separate PR's (clang and SPIR-V), I'd gladly support changes in clang part, it makes sense for me. At the moment, however, I have objections against SPIR-V Backend changes as they are represented in the PR:

This PR looks like a breaking change that would flip over the default value of mem scope for all environments except for OpenCL and may have a potentially negative impact on an unknown number of projects/customers. I'd guess that OpenCL would not notice the difference, because path that goes via upstream clang front-end redefines default mem scope as Device. All other toolchains just get a breaking change in the form of the AllSvmDevices default. clang-related changes do not help to smooth this, because SPIRV BE should remain agnostic towards front-ends, frameworks, etc.
A technical comment is that the proposed implementation in SPIR-V part is less efficient that existing. It compares strings rather than integers and fills in scope names on each call to the getMemScope() function, whereas existing implementation does it just once per a machine function.
A terminology (the choice of syncscope names) is debatable. The closest thing in specs that I see is https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_scope_id. I don't see any references to "singlethread" in the specs. Name "workitem" (spelling precisely as "work-item") is used at least in the official Khronos documents (see for example https://registry.khronos.org/SPIR-V/specs/1.0/SPIR-V-execution-and-memory-model.pdf). "all_svm_devices" is not mentioned in the specs at all (there is only the "CrossDevice" term).

===

For now, I'd rather see an eventual solution in the form of further classification of the computational flavor of SPIR-V (not just Compute vs. Vulkan but breaking Compute part further where this is required) -- comparing to this sudden change of the default in favor of any incarnation of Compute targets. As the first approach, all SPIR-V-related changes may require just a short snippet of the kind "if TheTriple is XXX-specific then use CrossDevice instead of Device" and minor rename of syncscope names ("subgroup", for example, indeed makes more sense than "sub_group"). This would probably require a description in the SPIRVUsage doc as well to avoid confusion among customers. Anyway, I'd be glad to talk out a reasonable way forward to get a better solution than we have now, if needed.

AlexVlx · 2024-09-09T13:04:06Z

Thank you for the PR! I'd like to better understand motivation and justification of SPIR-V BE-related changes though. The goal would be to understand whether AllSvmDevices is indeed a better choice (for whom?) than Device as a default mem scope value in SPIR-V BE.

Questions to the description of the PR.

"These were previously unconditionally lowered to Device scope, which is can be too conservative and possibly incorrect."

The claim is not justified by any docs/specs. Why Device scope is incorrect as a default? In my opinion, it's AllSvmDevices that looks like a conservative choice that may lead to performance degradation in general case when we change the default without notifying customers. Or, we may say that potential performance changes may depend on a vendor-specific behavior in this case.

"Furthermore, the default / implicit scope is changed from Device (an OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not OpenCL specific / can ingest IR coming from other language front-ends."

What I know without additional references to other docs/specs is that Device is default by OpenCL spec (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions). It would help if you can provide references where AllSvmDevices is a preferable choice, so that we are able to compare and figure out the best default for the Computational flavor of SPIR-V. For sure, SPIR-V BE is not OpenCL (=Device) specific, and it's also not specific to any particular vendor or computational framework. I've seen usages of AllSvmDevices as default in the code base (for example, in

llvm-project/clang/lib/CodeGen/Targets/AMDGPU.cpp

Line 537 in 319e8cd

Name = "";

), but it seems not enough to flip the default over.

"OpenCL defaulting to Device scope is now reflected in the front-end handling of atomic ops, which seems preferable."

Changes in clang part looks really good to me. However, when we add to it changes in SPIR-V part of the code base, things look less optimistic, because what this PR means by "the front-end handling of atomic ops" is the upstream clang only, whereas actual choices of a front-end are more versatile, and users coming to SPIR-V by other paths would get a sudden change of behavior in the worst case (e.g., MLIR input for the GenAI domain).

===

If it's acceptable to split this PR into two separate PR's (clang and SPIR-V), I'd gladly support changes in clang part, it makes sense for me. At the moment, however, I have objections against SPIR-V Backend changes as they are represented in the PR:

This PR looks like a breaking change that would flip over the default value of mem scope for all environments except for OpenCL and may have a potentially negative impact on an unknown number of projects/customers. I'd guess that OpenCL would not notice the difference, because path that goes via upstream clang front-end redefines default mem scope as Device. All other toolchains just get a breaking change in the form of the AllSvmDevices default. clang-related changes do not help to smooth this, because SPIRV BE should remain agnostic towards front-ends, frameworks, etc.

A technical comment is that the proposed implementation in SPIR-V part is less efficient that existing. It compares strings rather than integers and fills in scope names on each call to the getMemScope() function, whereas existing implementation does it just once per a machine function.

A terminology (the choice of syncscope names) is debatable. The closest thing in specs that I see is https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_scope_id. I don't see any references to "singlethread" in the specs. Name "workitem" (spelling precisely as "work-item") is used at least in the official Khronos documents (see for example https://registry.khronos.org/SPIR-V/specs/1.0/SPIR-V-execution-and-memory-model.pdf). "all_svm_devices" is not mentioned in the specs at all (there is only the "CrossDevice" term).

===

For now, I'd rather see an eventual solution in the form of further classification of the computational flavor of SPIR-V (not just Compute vs. Vulkan but breaking Compute part further where this is required) -- comparing to this sudden change of the default in favor of any incarnation of Compute targets. As the first approach, all SPIR-V-related changes may require just a short snippet of the kind "if TheTriple is XXX-specific then use CrossDevice instead of Device" and minor rename of syncscope names ("subgroup", for example, indeed makes more sense than "sub_group"). This would probably require a description in the SPIRVUsage doc as well to avoid confusion among customers. Anyway, I'd be glad to talk out a reasonable way forward to get a better solution than we have now, if needed.

Thank you for the thorough response, it's highly appreciated. Let me try to address some of the points being made:

The claim is not justified by any docs/specs. Why Device scope is incorrect as a default? In my opinion, it's AllSvmDevices that looks like a conservative choice that may lead to performance degradation in general case when we change the default without notifying customers. Or, we may say that potential performance changes may depend on a vendor-specific behavior in this case - poor/confusing choice of words on my part, apologies. The idea, which might be more a matter of philosophy, is the following: BEs should be correct by default, and forfeiting general correctness for performance should be opt-in. In the specific case of scopes, as far as the BE is concerned, if an explicit scope is not provided the "safest" scope (i.e. the one that subsumes / incorporates all others) should be chosen, to guarantee that the code just works. IMHO, whatever choice OpenCL (or any other language / higher-level source such as MLIR) makes regarding its defaults, it should be handled in the FE with all other linguistic concerns; it is also desirable to properly scope ops rather than rely on the default.
What I know without additional references to other docs/specs is that Device is default by OpenCL spec (https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#atomic-functions). It would help if you can provide references where AllSvmDevices is a preferable choice, so that we are able to compare and figure out the best default for the Computational flavor of SPIR-V. - consider all of the *CCL (NCCL, RCCL, CCCL) libs that do in-kernel cross device communication; at least one implementation uses unscoped Clang builtins, and this breaks down (in an opaque / subtle fashion) with the current default. Please note that we do have wording in LangRef about expected behaviour which I would interpret as matching what is bieng proposed, albeit in a slightly roundabout way, which might require tweaking: Otherwise, an atomic operation that is not marked syncscope("singlethread") or syncscope("<target-scope>") synchronizes with and participates in the seq_cst total orderings of other operations that are not marked syncscope("singlethread") or syncscope("<target-scope>").

I am sympathetic to the concerns about this possibly triggering heartburn by way of performance degradation, but I'd suggest that we don't want this to be target / vendor specific, as these are target orthogonal. Perhaps a better solution is to add an option that controls this behaviour / the default, that the BE can consume and act accordingly? Thus, there can be a smooth, opt-in transition from the current solution and nobody has to take any pain. Overall, I do think that it would (have) be(en) beneficial to have SPIR-V Compute somewhat less OpenCL specific, as there are far more potential generators (including HLLs such as C/C++ which assume a flat memory model, for example), but that train might have left the platform; had that been the case, we'd probably not be having this issue.

In what regards the performance concerns around doing a string compare, those are acknowledged. I believe the current string based design was put in place to give full freedom to targets, so relying strictly on integer values for scopes is legacy/less preferred. We are quite a few years removed from the design though, and it was before my time, so I might be misinterpreting - perhaps @arsenm & @yxsamliu can provide more comment here since they were involved at the time.

singlethread is one of the two LLVM defined syncscopes, which is always available / all targets must expose; Clang matches this, you always get SyncScpe::SingleThread and SyncScope::System. Since there can be multiple parent languages / higher-level generators that want to feed IR into the SPIR-V BE, it seemed preferable to match LLVM convention here, rather than OpenCL convention, since it's broader. On the same topic, subgroup vs sub_group is merely about symmetry with workgroup / matching OCL/SPIR-V nomenclature as far as I could observe it.

Overall, hopefully we can discuss this more and try to arrive at a robust solution for everyone. Please let me know if you'd prefer we take this to the SPIR-V BE discussion group / someplace else, or if you'd like to keep this conversation going.

…cnspirv

clang/lib/CodeGen/Targets/SPIR.cpp

llvm/test/CodeGen/SPIRV/AtomicCompareExchange.ll

llvm/test/CodeGen/SPIRV/atomicrmw.ll

llvm/test/CodeGen/SPIRV/scoped_atomicrmw.ll

clang/lib/CodeGen/CGAtomic.cpp

…cnspirv

AlexVlx · 2024-09-24T14:02:47Z

Thank you ever so much for the review @VyacheslavLevytskyy! I will create a PR for the Translator as well, since there's some handling missing there; I will refer to it here for future readers. Final check: are you OK with the OpenCL changes @yxsamliu?

yxsamliu · 2024-09-24T16:20:26Z

Thank you ever so much for the review @VyacheslavLevytskyy! I will create a PR for the Translator as well, since there's some handling missing there; I will refer to it here for future readers. Final check: are you OK with the OpenCL changes @yxsamliu?

LGTM

AlexVlx · 2024-09-25T02:42:38Z

I've created KhronosGroup/SPIRV-LLVM-Translator#2727 for the (potential) Translator work.

…ion' when building PR #106429 with gcc (#109924) It appears that PR #106429 introduced an issue for builds with SPIRV Backend target when building with gcc, e.g.: ``` /llvm-project/llvm/lib/Target/SPIRV/SPIRVUtils.cpp:263:36: error: use of parameter from containing function 263 | llvm::SyncScope::ID SubGroup = Ctx.getOrInsertSyncScopeID("subgroup"); | ^~~ /llvm-project/llvm/lib/Target/SPIRV/SPIRVUtils.cpp:256:46: note: ‘llvm::LLVMContext& Ctx’ declared here 256 | SPIRV::Scope::Scope getMemScope(LLVMContext &Ctx, SyncScope::ID Id) { ``` This PR fixes this by removing struct and using static const variables instead.

This patch collects the mapping from LLVM SyncScope ID, which was spread out across multiple sites, into a single utility function. Furthermore, it realigns the mapping to match LLVM conventions, namely it defaults to System / CrossDevice (please see llvm/llvm-project#106429) for more context for the proposed changes.

This patch collects the mapping from LLVM SyncScope ID, which was spread out across multiple sites, into a single utility function. Furthermore, it realigns the mapping to match LLVM conventions, namely it defaults to System / CrossDevice (please see llvm/llvm-project#106429) for more context for the proposed changes. Original commit: KhronosGroup/SPIRV-LLVM-Translator@630a90a2a20b590

Adds the following patches AMDGPU: Remove wavefrontsize64 feature from dummy target llvm#117410 [LLVM][NFC] Use used's element type if available llvm#116804 [llvm][AMDGPU] Fold llvm.amdgcn.wavefrontsize early llvm#114481 [clang][Driver][HIP] Add support for mixing AMDGCNSPIRV & concrete offload-archs. llvm#113509 [clang][llvm][SPIR-V] Explicitly encode native integer widths for SPIR-V llvm#110695 [llvm][opt][Transforms] Replacement calloc should match replaced malloc llvm#110524 [clang][HIP] Don't use the OpenCLKernel CC when targeting AMDGCNSPIRV llvm#110447 [cuda][HIP] constant should imply constant llvm#110182 [llvm][SPIRV] Expose fast popcnt support for SPIR-V targets llvm#109845 [clang][CodeGen][SPIR-V] Fix incorrect SYCL usage, implement missing interface llvm#109415 [SPIRV][RFC] Rework / extend support for memory scopes llvm#106429 [clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. llvm#102776 Change-Id: I2b9ab54aba1c9345b9b0eb84409e6ed6c3cdb6cd

AlexVlx added 13 commits August 11, 2024 01:39

Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates…

d41faf6

… passed to kernels / functions.

Fix formatting error.

757e119

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

af1a416

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

0cb89f6

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

11162b0

…cnspirv

No else after return.

13f83ac

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

fe68593

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

e466a56

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

734630a

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

1637876

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

1d3fedb

…cnspirv

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

440a0ef

…cnspirv

Re-work SPIR-V support for memory scopes.

daa76c3

AlexVlx added backend:SPIR-V SPIR-V SPIR-V language support labels Aug 28, 2024

AlexVlx requested review from arsenm, VyacheslavLevytskyy, bader, michalpaszkowski and yxsamliu August 28, 2024 18:23

AlexVlx added 2 commits August 28, 2024 19:28

Fix formatting.

25378a7

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

fc422b1

…cnspirv

arsenm reviewed Aug 29, 2024

View reviewed changes

VyacheslavLevytskyy requested a review from bogner September 2, 2024 08:15

AlexVlx added 2 commits September 4, 2024 10:47

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

b6fd508

…cnspirv

Incorporate review feedback.

79acf40

AlexVlx added 4 commits September 18, 2024 20:13

No need for aliases / special handling of System scope.

e984939

Remove & replace SyncScopeIDs struct.

ced6877

Fix formatting.

ec0eb50

Merge branch 'main' of https://github.com/llvm/llvm-project into amdg…

8621e36

…cnspirv