[Flang][OpenMP][MLIR] Add support for -nogpulib option #71045

DominikAdamski · 2023-11-02T10:54:15Z

If -nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime.

Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

Added an OpenMP attribute which will model nogpulib flag. If nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime. Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

If nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime. Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

llvmbot · 2023-11-02T10:54:54Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-mlir

Author: Dominik Adamski (DominikAdamski)

Changes

If -nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime.

Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

Full diff: https://github.com/llvm/llvm-project/pull/71045.diff

14 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1-1)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+2)
(modified) flang/include/flang/Frontend/LangOptions.def (+2)
(modified) flang/include/flang/Tools/CrossToolHelpers.h (+7-4)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+2)
(modified) flang/test/Driver/driver-help-hidden.f90 (+1)
(modified) flang/test/Driver/driver-help.f90 (+2)
(added) flang/test/Lower/OpenMP/nogpulib.f90 (+12)
(modified) flang/tools/bbc/bbc.cpp (+7-1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td (+3-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+6-2)
(modified) mlir/test/Dialect/OpenMP/attr.mlir (+6)
(modified) mlir/test/Target/LLVMIR/openmp-llvm.mlir (+10)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index b1229b2f4562379..cc50b45b7933d5f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5092,7 +5092,7 @@ def nohipwrapperinc : Flag<["-"], "nohipwrapperinc">, Group<IncludePath_Group>,
   HelpText<"Do not include the default HIP wrapper headers and include paths">;
 def : Flag<["-"], "nocudainc">, Alias<nogpuinc>;
 def nogpulib : Flag<["-"], "nogpulib">, MarshallingInfoFlag<LangOpts<"NoGPULib">>,
-  Visibility<[ClangOption, CC1Option]>,
+  Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Do not link device library for CUDA/HIP device compilation">;
 def : Flag<["-"], "nocudalib">, Alias<nogpulib>;
 def gpulibc : Flag<["-"], "gpulibc">, Visibility<[ClangOption, CC1Option]>,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 999039f83ddfb92..17d9ac7632128d6 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -295,6 +295,8 @@ void Flang::addOffloadOptions(Compilation &C, const InputInfoList &Inputs,
       CmdArgs.push_back("-fopenmp-assume-no-thread-state");
     if (Args.hasArg(options::OPT_fopenmp_assume_no_nested_parallelism))
       CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
+    if (Args.hasArg(options::OPT_nogpulib))
+      CmdArgs.push_back("-nogpulib");
   }
 }
 
diff --git a/flang/include/flang/Frontend/LangOptions.def b/flang/include/flang/Frontend/LangOptions.def
index 3a1d44f7fb472d1..2bf10826120a8bd 100644
--- a/flang/include/flang/Frontend/LangOptions.def
+++ b/flang/include/flang/Frontend/LangOptions.def
@@ -21,6 +21,8 @@ LANGOPT(Name, Bits, Default)
 
 ENUM_LANGOPT(FPContractMode, FPModeKind, 2, FPM_Fast) ///< FP Contract Mode (off/fast)
 
+/// Indicate a build without the standard GPU libraries.
+LANGOPT(NoGPULib  , 1, false)
 /// Permit floating point optimization without regard to infinities
 LANGOPT(NoHonorInfs, 1, false)
 /// Permit floating point optimization without regard to NaN
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h
index ddec70fa9824c52..3b51a3845625df3 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -53,14 +53,16 @@ struct OffloadModuleOpts {
   OffloadModuleOpts(uint32_t OpenMPTargetDebug, bool OpenMPTeamSubscription,
       bool OpenMPThreadSubscription, bool OpenMPNoThreadState,
       bool OpenMPNoNestedParallelism, bool OpenMPIsTargetDevice,
-      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {})
+      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {},
+      bool NoGPULib = false)
       : OpenMPTargetDebug(OpenMPTargetDebug),
         OpenMPTeamSubscription(OpenMPTeamSubscription),
         OpenMPThreadSubscription(OpenMPThreadSubscription),
         OpenMPNoThreadState(OpenMPNoThreadState),
         OpenMPNoNestedParallelism(OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(OpenMPIsTargetDevice), OpenMPIsGPU(OpenMPIsGPU),
-        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile) {}
+        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile),
+        NoGPULib(NoGPULib) {}
 
   OffloadModuleOpts(Fortran::frontend::LangOptions &Opts)
       : OpenMPTargetDebug(Opts.OpenMPTargetDebug),
@@ -70,7 +72,7 @@ struct OffloadModuleOpts {
         OpenMPNoNestedParallelism(Opts.OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(Opts.OpenMPIsTargetDevice),
         OpenMPIsGPU(Opts.OpenMPIsGPU), OpenMPVersion(Opts.OpenMPVersion),
-        OMPHostIRFile(Opts.OMPHostIRFile) {}
+        OMPHostIRFile(Opts.OMPHostIRFile), NoGPULib(Opts.NoGPULib) {}
 
   uint32_t OpenMPTargetDebug = 0;
   bool OpenMPTeamSubscription = false;
@@ -81,6 +83,7 @@ struct OffloadModuleOpts {
   bool OpenMPIsGPU = false;
   uint32_t OpenMPVersion = 11;
   std::string OMPHostIRFile = {};
+  bool NoGPULib = false;
 };
 
 //  Shares assinging of the OpenMP OffloadModuleInterface and its assorted
@@ -95,7 +98,7 @@ void setOffloadModuleInterfaceAttributes(
     if (Opts.OpenMPIsTargetDevice) {
       offloadMod.setFlags(Opts.OpenMPTargetDebug, Opts.OpenMPTeamSubscription,
           Opts.OpenMPThreadSubscription, Opts.OpenMPNoThreadState,
-          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion);
+          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion, Opts.NoGPULib);
 
       if (!Opts.OMPHostIRFile.empty())
         offloadMod.setHostIRFilePath(Opts.OMPHostIRFile);
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index ba2ecab3742587a..ef0db22037283d3 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -881,6 +881,8 @@ static bool parseDialectArgs(CompilerInvocation &res, llvm::opt::ArgList &args,
             args.hasArg(clang::driver::options::OPT_fopenmp_target_debug))
           res.getLangOpts().OpenMPTargetDebug = 1;
       }
+      if (args.hasArg(clang::driver::options::OPT_nogpulib))
+        res.getLangOpts().NoGPULib = 1;
     }
 
     switch (llvm::Triple(res.getTargetOpts().triple).getArch()) {
diff --git a/flang/test/Driver/driver-help-hidden.f90 b/flang/test/Driver/driver-help-hidden.f90
index 6d399f1d179a022..2e65096d0c3fec4 100644
--- a/flang/test/Driver/driver-help-hidden.f90
+++ b/flang/test/Driver/driver-help-hidden.f90
@@ -119,6 +119,7 @@
 ! CHECK-NEXT: --no-offload-arch=<value>
 ! CHECK-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! CHECK-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! CHECK-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! CHECK-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! CHECK-NEXT: --offload-device-only   Only compile for the offloading device.
 ! CHECK-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
diff --git a/flang/test/Driver/driver-help.f90 b/flang/test/Driver/driver-help.f90
index 31c9caa32ea8292..a29e8a0b69db46d 100644
--- a/flang/test/Driver/driver-help.f90
+++ b/flang/test/Driver/driver-help.f90
@@ -107,6 +107,7 @@
 ! HELP-NEXT: --no-offload-arch=<value>
 ! HELP-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! HELP-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! HELP-NEXT: --offload-device-only   Only compile for the offloading device.
 ! HELP-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
@@ -239,6 +240,7 @@
 ! HELP-FC1-NEXT: -mvscale-max=<value>    Specify the vscale maximum. Defaults to the vector length agnostic value of "0". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -mvscale-min=<value>    Specify the vscale minimum. Defaults to "1". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-FC1-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-FC1-NEXT: -opt-record-file <value>
 ! HELP-FC1-NEXT:                         File name to use for YAML optimization record output
 ! HELP-FC1-NEXT: -opt-record-format <value>
diff --git a/flang/test/Lower/OpenMP/nogpulib.f90 b/flang/test/Lower/OpenMP/nogpulib.f90
new file mode 100644
index 000000000000000..f2e67136ecd7481
--- /dev/null
+++ b/flang/test/Lower/OpenMP/nogpulib.f90
@@ -0,0 +1,12 @@
+!REQUIRES: amdgpu-registered-target
+
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -o - %s | FileCheck %s
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device -nogpulib %s -o - | FileCheck %s -check-prefix=FLAG_SET
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -nogpulib -o - %s | FileCheck %s -check-prefix=FLAG_SET
+
+!CHECK-NOT: module attributes {{{.*}}no_gpu_lib
+!FLAG_SET: module attributes {{{.*}}no_gpu_lib = true
+subroutine omp_subroutine()
+end subroutine omp_subroutine
+
diff --git a/flang/tools/bbc/bbc.cpp b/flang/tools/bbc/bbc.cpp
index 378216bd1e51f8f..a66c41eed59651c 100644
--- a/flang/tools/bbc/bbc.cpp
+++ b/flang/tools/bbc/bbc.cpp
@@ -177,6 +177,12 @@ static llvm::cl::opt<bool> setOpenMPNoNestedParallelism(
                    "a parallel region."),
     llvm::cl::init(false));
 
+static llvm::cl::opt<bool>
+    setNoGPULib("nogpulib",
+                llvm::cl::desc("Do not link device library for CUDA/HIP device "
+                               "compilation"),
+                llvm::cl::init(false));
+
 static llvm::cl::opt<bool> enableOpenACC("fopenacc",
                                          llvm::cl::desc("enable openacc"),
                                          llvm::cl::init(false));
@@ -316,7 +322,7 @@ static mlir::LogicalResult convertFortranSourceToMLIR(
         OffloadModuleOpts(setOpenMPTargetDebug, setOpenMPTeamSubscription,
                           setOpenMPThreadSubscription, setOpenMPNoThreadState,
                           setOpenMPNoNestedParallelism, enableOpenMPDevice,
-                          enableOpenMPGPU, setOpenMPVersion);
+                          enableOpenMPGPU, setOpenMPVersion, "", setNoGPULib);
     setOffloadModuleInterfaceAttributes(mlirModule, offloadModuleOpts);
     setOpenMPVersionAttribute(mlirModule, setOpenMPVersion);
   }
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 72121ad4f9e47a5..ae2dd018059e3aa 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -65,6 +65,7 @@ def FlagsAttr : OpenMP_Attr<"Flags", "flags"> {
     DefaultValuedParameter<"bool", "false">:$assume_threads_oversubscription,
     DefaultValuedParameter<"bool", "false">:$assume_no_thread_state,
     DefaultValuedParameter<"bool", "false">:$assume_no_nested_parallelism,
+    DefaultValuedParameter<"bool", "false">:$no_gpu_lib,
     DefaultValuedParameter<"uint32_t", "50">:$openmp_device_version
   );
 
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
index 77001fc816cf91f..89d04af64766fc2 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
@@ -198,11 +198,12 @@ def OffloadModuleInterface : OpInterface<"OffloadModuleInterface"> {
             "bool":$assumeThreadsOversubscription,
             "bool":$assumeNoThreadState,
             "bool":$assumeNoNestedParallelism,
-            "uint32_t":$openmpDeviceVersion), [{}], [{
+            "uint32_t":$openmpDeviceVersion,
+            "bool":$noGPULib), [{}], [{
         $_op->setAttr(("omp." + mlir::omp::FlagsAttr::getMnemonic()).str(),
                   mlir::omp::FlagsAttr::get($_op->getContext(), debugKind,
                       assumeTeamsOversubscription, assumeThreadsOversubscription,
-                      assumeNoThreadState, assumeNoNestedParallelism, openmpDeviceVersion));
+                      assumeNoThreadState, assumeNoNestedParallelism, noGPULib, openmpDeviceVersion));
       }]>,
     InterfaceMethod<
       /*description=*/[{
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 1daf60b8659bb66..083b507f405f426 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2011,6 +2011,12 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
+  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
+                              attribute.getOpenmpDeviceVersion());
+
+  if (attribute.getNoGpuLib())
+    return success();
+
   ompBuilder->createGlobalFlag(
       attribute.getDebugKind() /*LangOpts().OpenMPTargetDebug*/,
       "__omp_rtl_debug_kind");
@@ -2032,8 +2038,6 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
           .getAssumeNoNestedParallelism() /*LangOpts().OpenMPNoNestedParallelism*/
       ,
       "__omp_rtl_assume_no_nested_parallelism");
-  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
-                              attribute.getOpenmpDeviceVersion());
   return success();
 }
 
diff --git a/mlir/test/Dialect/OpenMP/attr.mlir b/mlir/test/Dialect/OpenMP/attr.mlir
index fc89ebd01b6bf57..2e88dac8ff4d2b9 100644
--- a/mlir/test/Dialect/OpenMP/attr.mlir
+++ b/mlir/test/Dialect/OpenMP/attr.mlir
@@ -54,6 +54,12 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 // CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, openmp_device_version = 51>} {
 module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, openmp_device_version = 51>} {}
 
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, no_gpu_lib = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, no_gpu_lib = true, openmp_device_version = 51>} {}
+
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, no_gpu_lib = false, openmp_device_version = 51>} {}
+
 // CHECK: module attributes {omp.version = #omp.version<version = 51>} {
 module attributes {omp.version = #omp.version<version = 51>} {}
 
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 116c05f3747c6ae..8da17107466f817 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -2530,6 +2530,16 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 
 // -----
 
+// CHECK-NOT: @__omp_rtl_debug_kind = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_no_thread_state = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden constant i32 0
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true,
+                                          no_gpu_lib=true>} {}
+
+// -----
+
 module attributes {omp.is_target_device = false} {
   // CHECK: define void @filter_nohost
   llvm.func @filter_nohost() -> ()

llvmbot · 2023-11-02T10:54:54Z

@llvm/pr-subscribers-flang-driver

Author: Dominik Adamski (DominikAdamski)

Changes

If -nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime.

Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

Full diff: https://github.com/llvm/llvm-project/pull/71045.diff

14 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1-1)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+2)
(modified) flang/include/flang/Frontend/LangOptions.def (+2)
(modified) flang/include/flang/Tools/CrossToolHelpers.h (+7-4)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+2)
(modified) flang/test/Driver/driver-help-hidden.f90 (+1)
(modified) flang/test/Driver/driver-help.f90 (+2)
(added) flang/test/Lower/OpenMP/nogpulib.f90 (+12)
(modified) flang/tools/bbc/bbc.cpp (+7-1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td (+3-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+6-2)
(modified) mlir/test/Dialect/OpenMP/attr.mlir (+6)
(modified) mlir/test/Target/LLVMIR/openmp-llvm.mlir (+10)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index b1229b2f4562379..cc50b45b7933d5f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5092,7 +5092,7 @@ def nohipwrapperinc : Flag<["-"], "nohipwrapperinc">, Group<IncludePath_Group>,
   HelpText<"Do not include the default HIP wrapper headers and include paths">;
 def : Flag<["-"], "nocudainc">, Alias<nogpuinc>;
 def nogpulib : Flag<["-"], "nogpulib">, MarshallingInfoFlag<LangOpts<"NoGPULib">>,
-  Visibility<[ClangOption, CC1Option]>,
+  Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Do not link device library for CUDA/HIP device compilation">;
 def : Flag<["-"], "nocudalib">, Alias<nogpulib>;
 def gpulibc : Flag<["-"], "gpulibc">, Visibility<[ClangOption, CC1Option]>,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 999039f83ddfb92..17d9ac7632128d6 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -295,6 +295,8 @@ void Flang::addOffloadOptions(Compilation &C, const InputInfoList &Inputs,
       CmdArgs.push_back("-fopenmp-assume-no-thread-state");
     if (Args.hasArg(options::OPT_fopenmp_assume_no_nested_parallelism))
       CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
+    if (Args.hasArg(options::OPT_nogpulib))
+      CmdArgs.push_back("-nogpulib");
   }
 }
 
diff --git a/flang/include/flang/Frontend/LangOptions.def b/flang/include/flang/Frontend/LangOptions.def
index 3a1d44f7fb472d1..2bf10826120a8bd 100644
--- a/flang/include/flang/Frontend/LangOptions.def
+++ b/flang/include/flang/Frontend/LangOptions.def
@@ -21,6 +21,8 @@ LANGOPT(Name, Bits, Default)
 
 ENUM_LANGOPT(FPContractMode, FPModeKind, 2, FPM_Fast) ///< FP Contract Mode (off/fast)
 
+/// Indicate a build without the standard GPU libraries.
+LANGOPT(NoGPULib  , 1, false)
 /// Permit floating point optimization without regard to infinities
 LANGOPT(NoHonorInfs, 1, false)
 /// Permit floating point optimization without regard to NaN
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h
index ddec70fa9824c52..3b51a3845625df3 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -53,14 +53,16 @@ struct OffloadModuleOpts {
   OffloadModuleOpts(uint32_t OpenMPTargetDebug, bool OpenMPTeamSubscription,
       bool OpenMPThreadSubscription, bool OpenMPNoThreadState,
       bool OpenMPNoNestedParallelism, bool OpenMPIsTargetDevice,
-      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {})
+      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {},
+      bool NoGPULib = false)
       : OpenMPTargetDebug(OpenMPTargetDebug),
         OpenMPTeamSubscription(OpenMPTeamSubscription),
         OpenMPThreadSubscription(OpenMPThreadSubscription),
         OpenMPNoThreadState(OpenMPNoThreadState),
         OpenMPNoNestedParallelism(OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(OpenMPIsTargetDevice), OpenMPIsGPU(OpenMPIsGPU),
-        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile) {}
+        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile),
+        NoGPULib(NoGPULib) {}
 
   OffloadModuleOpts(Fortran::frontend::LangOptions &Opts)
       : OpenMPTargetDebug(Opts.OpenMPTargetDebug),
@@ -70,7 +72,7 @@ struct OffloadModuleOpts {
         OpenMPNoNestedParallelism(Opts.OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(Opts.OpenMPIsTargetDevice),
         OpenMPIsGPU(Opts.OpenMPIsGPU), OpenMPVersion(Opts.OpenMPVersion),
-        OMPHostIRFile(Opts.OMPHostIRFile) {}
+        OMPHostIRFile(Opts.OMPHostIRFile), NoGPULib(Opts.NoGPULib) {}
 
   uint32_t OpenMPTargetDebug = 0;
   bool OpenMPTeamSubscription = false;
@@ -81,6 +83,7 @@ struct OffloadModuleOpts {
   bool OpenMPIsGPU = false;
   uint32_t OpenMPVersion = 11;
   std::string OMPHostIRFile = {};
+  bool NoGPULib = false;
 };
 
 //  Shares assinging of the OpenMP OffloadModuleInterface and its assorted
@@ -95,7 +98,7 @@ void setOffloadModuleInterfaceAttributes(
     if (Opts.OpenMPIsTargetDevice) {
       offloadMod.setFlags(Opts.OpenMPTargetDebug, Opts.OpenMPTeamSubscription,
           Opts.OpenMPThreadSubscription, Opts.OpenMPNoThreadState,
-          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion);
+          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion, Opts.NoGPULib);
 
       if (!Opts.OMPHostIRFile.empty())
         offloadMod.setHostIRFilePath(Opts.OMPHostIRFile);
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index ba2ecab3742587a..ef0db22037283d3 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -881,6 +881,8 @@ static bool parseDialectArgs(CompilerInvocation &res, llvm::opt::ArgList &args,
             args.hasArg(clang::driver::options::OPT_fopenmp_target_debug))
           res.getLangOpts().OpenMPTargetDebug = 1;
       }
+      if (args.hasArg(clang::driver::options::OPT_nogpulib))
+        res.getLangOpts().NoGPULib = 1;
     }
 
     switch (llvm::Triple(res.getTargetOpts().triple).getArch()) {
diff --git a/flang/test/Driver/driver-help-hidden.f90 b/flang/test/Driver/driver-help-hidden.f90
index 6d399f1d179a022..2e65096d0c3fec4 100644
--- a/flang/test/Driver/driver-help-hidden.f90
+++ b/flang/test/Driver/driver-help-hidden.f90
@@ -119,6 +119,7 @@
 ! CHECK-NEXT: --no-offload-arch=<value>
 ! CHECK-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! CHECK-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! CHECK-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! CHECK-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! CHECK-NEXT: --offload-device-only   Only compile for the offloading device.
 ! CHECK-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
diff --git a/flang/test/Driver/driver-help.f90 b/flang/test/Driver/driver-help.f90
index 31c9caa32ea8292..a29e8a0b69db46d 100644
--- a/flang/test/Driver/driver-help.f90
+++ b/flang/test/Driver/driver-help.f90
@@ -107,6 +107,7 @@
 ! HELP-NEXT: --no-offload-arch=<value>
 ! HELP-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! HELP-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! HELP-NEXT: --offload-device-only   Only compile for the offloading device.
 ! HELP-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
@@ -239,6 +240,7 @@
 ! HELP-FC1-NEXT: -mvscale-max=<value>    Specify the vscale maximum. Defaults to the vector length agnostic value of "0". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -mvscale-min=<value>    Specify the vscale minimum. Defaults to "1". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-FC1-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-FC1-NEXT: -opt-record-file <value>
 ! HELP-FC1-NEXT:                         File name to use for YAML optimization record output
 ! HELP-FC1-NEXT: -opt-record-format <value>
diff --git a/flang/test/Lower/OpenMP/nogpulib.f90 b/flang/test/Lower/OpenMP/nogpulib.f90
new file mode 100644
index 000000000000000..f2e67136ecd7481
--- /dev/null
+++ b/flang/test/Lower/OpenMP/nogpulib.f90
@@ -0,0 +1,12 @@
+!REQUIRES: amdgpu-registered-target
+
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -o - %s | FileCheck %s
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device -nogpulib %s -o - | FileCheck %s -check-prefix=FLAG_SET
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -nogpulib -o - %s | FileCheck %s -check-prefix=FLAG_SET
+
+!CHECK-NOT: module attributes {{{.*}}no_gpu_lib
+!FLAG_SET: module attributes {{{.*}}no_gpu_lib = true
+subroutine omp_subroutine()
+end subroutine omp_subroutine
+
diff --git a/flang/tools/bbc/bbc.cpp b/flang/tools/bbc/bbc.cpp
index 378216bd1e51f8f..a66c41eed59651c 100644
--- a/flang/tools/bbc/bbc.cpp
+++ b/flang/tools/bbc/bbc.cpp
@@ -177,6 +177,12 @@ static llvm::cl::opt<bool> setOpenMPNoNestedParallelism(
                    "a parallel region."),
     llvm::cl::init(false));
 
+static llvm::cl::opt<bool>
+    setNoGPULib("nogpulib",
+                llvm::cl::desc("Do not link device library for CUDA/HIP device "
+                               "compilation"),
+                llvm::cl::init(false));
+
 static llvm::cl::opt<bool> enableOpenACC("fopenacc",
                                          llvm::cl::desc("enable openacc"),
                                          llvm::cl::init(false));
@@ -316,7 +322,7 @@ static mlir::LogicalResult convertFortranSourceToMLIR(
         OffloadModuleOpts(setOpenMPTargetDebug, setOpenMPTeamSubscription,
                           setOpenMPThreadSubscription, setOpenMPNoThreadState,
                           setOpenMPNoNestedParallelism, enableOpenMPDevice,
-                          enableOpenMPGPU, setOpenMPVersion);
+                          enableOpenMPGPU, setOpenMPVersion, "", setNoGPULib);
     setOffloadModuleInterfaceAttributes(mlirModule, offloadModuleOpts);
     setOpenMPVersionAttribute(mlirModule, setOpenMPVersion);
   }
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 72121ad4f9e47a5..ae2dd018059e3aa 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -65,6 +65,7 @@ def FlagsAttr : OpenMP_Attr<"Flags", "flags"> {
     DefaultValuedParameter<"bool", "false">:$assume_threads_oversubscription,
     DefaultValuedParameter<"bool", "false">:$assume_no_thread_state,
     DefaultValuedParameter<"bool", "false">:$assume_no_nested_parallelism,
+    DefaultValuedParameter<"bool", "false">:$no_gpu_lib,
     DefaultValuedParameter<"uint32_t", "50">:$openmp_device_version
   );
 
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
index 77001fc816cf91f..89d04af64766fc2 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
@@ -198,11 +198,12 @@ def OffloadModuleInterface : OpInterface<"OffloadModuleInterface"> {
             "bool":$assumeThreadsOversubscription,
             "bool":$assumeNoThreadState,
             "bool":$assumeNoNestedParallelism,
-            "uint32_t":$openmpDeviceVersion), [{}], [{
+            "uint32_t":$openmpDeviceVersion,
+            "bool":$noGPULib), [{}], [{
         $_op->setAttr(("omp." + mlir::omp::FlagsAttr::getMnemonic()).str(),
                   mlir::omp::FlagsAttr::get($_op->getContext(), debugKind,
                       assumeTeamsOversubscription, assumeThreadsOversubscription,
-                      assumeNoThreadState, assumeNoNestedParallelism, openmpDeviceVersion));
+                      assumeNoThreadState, assumeNoNestedParallelism, noGPULib, openmpDeviceVersion));
       }]>,
     InterfaceMethod<
       /*description=*/[{
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 1daf60b8659bb66..083b507f405f426 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2011,6 +2011,12 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
+  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
+                              attribute.getOpenmpDeviceVersion());
+
+  if (attribute.getNoGpuLib())
+    return success();
+
   ompBuilder->createGlobalFlag(
       attribute.getDebugKind() /*LangOpts().OpenMPTargetDebug*/,
       "__omp_rtl_debug_kind");
@@ -2032,8 +2038,6 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
           .getAssumeNoNestedParallelism() /*LangOpts().OpenMPNoNestedParallelism*/
       ,
       "__omp_rtl_assume_no_nested_parallelism");
-  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
-                              attribute.getOpenmpDeviceVersion());
   return success();
 }
 
diff --git a/mlir/test/Dialect/OpenMP/attr.mlir b/mlir/test/Dialect/OpenMP/attr.mlir
index fc89ebd01b6bf57..2e88dac8ff4d2b9 100644
--- a/mlir/test/Dialect/OpenMP/attr.mlir
+++ b/mlir/test/Dialect/OpenMP/attr.mlir
@@ -54,6 +54,12 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 // CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, openmp_device_version = 51>} {
 module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, openmp_device_version = 51>} {}
 
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, no_gpu_lib = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, no_gpu_lib = true, openmp_device_version = 51>} {}
+
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, no_gpu_lib = false, openmp_device_version = 51>} {}
+
 // CHECK: module attributes {omp.version = #omp.version<version = 51>} {
 module attributes {omp.version = #omp.version<version = 51>} {}
 
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 116c05f3747c6ae..8da17107466f817 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -2530,6 +2530,16 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 
 // -----
 
+// CHECK-NOT: @__omp_rtl_debug_kind = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_no_thread_state = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden constant i32 0
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true,
+                                          no_gpu_lib=true>} {}
+
+// -----
+
 module attributes {omp.is_target_device = false} {
   // CHECK: define void @filter_nohost
   llvm.func @filter_nohost() -> ()

llvmbot · 2023-11-02T10:54:54Z

@llvm/pr-subscribers-mlir-llvm

Author: Dominik Adamski (DominikAdamski)

Changes

If -nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime.

Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

Full diff: https://github.com/llvm/llvm-project/pull/71045.diff

14 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1-1)
(modified) clang/lib/Driver/ToolChains/Flang.cpp (+2)
(modified) flang/include/flang/Frontend/LangOptions.def (+2)
(modified) flang/include/flang/Tools/CrossToolHelpers.h (+7-4)
(modified) flang/lib/Frontend/CompilerInvocation.cpp (+2)
(modified) flang/test/Driver/driver-help-hidden.f90 (+1)
(modified) flang/test/Driver/driver-help.f90 (+2)
(added) flang/test/Lower/OpenMP/nogpulib.f90 (+12)
(modified) flang/tools/bbc/bbc.cpp (+7-1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+1)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td (+3-2)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+6-2)
(modified) mlir/test/Dialect/OpenMP/attr.mlir (+6)
(modified) mlir/test/Target/LLVMIR/openmp-llvm.mlir (+10)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index b1229b2f4562379..cc50b45b7933d5f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -5092,7 +5092,7 @@ def nohipwrapperinc : Flag<["-"], "nohipwrapperinc">, Group<IncludePath_Group>,
   HelpText<"Do not include the default HIP wrapper headers and include paths">;
 def : Flag<["-"], "nocudainc">, Alias<nogpuinc>;
 def nogpulib : Flag<["-"], "nogpulib">, MarshallingInfoFlag<LangOpts<"NoGPULib">>,
-  Visibility<[ClangOption, CC1Option]>,
+  Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>,
   HelpText<"Do not link device library for CUDA/HIP device compilation">;
 def : Flag<["-"], "nocudalib">, Alias<nogpulib>;
 def gpulibc : Flag<["-"], "gpulibc">, Visibility<[ClangOption, CC1Option]>,
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index 999039f83ddfb92..17d9ac7632128d6 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -295,6 +295,8 @@ void Flang::addOffloadOptions(Compilation &C, const InputInfoList &Inputs,
       CmdArgs.push_back("-fopenmp-assume-no-thread-state");
     if (Args.hasArg(options::OPT_fopenmp_assume_no_nested_parallelism))
       CmdArgs.push_back("-fopenmp-assume-no-nested-parallelism");
+    if (Args.hasArg(options::OPT_nogpulib))
+      CmdArgs.push_back("-nogpulib");
   }
 }
 
diff --git a/flang/include/flang/Frontend/LangOptions.def b/flang/include/flang/Frontend/LangOptions.def
index 3a1d44f7fb472d1..2bf10826120a8bd 100644
--- a/flang/include/flang/Frontend/LangOptions.def
+++ b/flang/include/flang/Frontend/LangOptions.def
@@ -21,6 +21,8 @@ LANGOPT(Name, Bits, Default)
 
 ENUM_LANGOPT(FPContractMode, FPModeKind, 2, FPM_Fast) ///< FP Contract Mode (off/fast)
 
+/// Indicate a build without the standard GPU libraries.
+LANGOPT(NoGPULib  , 1, false)
 /// Permit floating point optimization without regard to infinities
 LANGOPT(NoHonorInfs, 1, false)
 /// Permit floating point optimization without regard to NaN
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h
index ddec70fa9824c52..3b51a3845625df3 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -53,14 +53,16 @@ struct OffloadModuleOpts {
   OffloadModuleOpts(uint32_t OpenMPTargetDebug, bool OpenMPTeamSubscription,
       bool OpenMPThreadSubscription, bool OpenMPNoThreadState,
       bool OpenMPNoNestedParallelism, bool OpenMPIsTargetDevice,
-      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {})
+      bool OpenMPIsGPU, uint32_t OpenMPVersion, std::string OMPHostIRFile = {},
+      bool NoGPULib = false)
       : OpenMPTargetDebug(OpenMPTargetDebug),
         OpenMPTeamSubscription(OpenMPTeamSubscription),
         OpenMPThreadSubscription(OpenMPThreadSubscription),
         OpenMPNoThreadState(OpenMPNoThreadState),
         OpenMPNoNestedParallelism(OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(OpenMPIsTargetDevice), OpenMPIsGPU(OpenMPIsGPU),
-        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile) {}
+        OpenMPVersion(OpenMPVersion), OMPHostIRFile(OMPHostIRFile),
+        NoGPULib(NoGPULib) {}
 
   OffloadModuleOpts(Fortran::frontend::LangOptions &Opts)
       : OpenMPTargetDebug(Opts.OpenMPTargetDebug),
@@ -70,7 +72,7 @@ struct OffloadModuleOpts {
         OpenMPNoNestedParallelism(Opts.OpenMPNoNestedParallelism),
         OpenMPIsTargetDevice(Opts.OpenMPIsTargetDevice),
         OpenMPIsGPU(Opts.OpenMPIsGPU), OpenMPVersion(Opts.OpenMPVersion),
-        OMPHostIRFile(Opts.OMPHostIRFile) {}
+        OMPHostIRFile(Opts.OMPHostIRFile), NoGPULib(Opts.NoGPULib) {}
 
   uint32_t OpenMPTargetDebug = 0;
   bool OpenMPTeamSubscription = false;
@@ -81,6 +83,7 @@ struct OffloadModuleOpts {
   bool OpenMPIsGPU = false;
   uint32_t OpenMPVersion = 11;
   std::string OMPHostIRFile = {};
+  bool NoGPULib = false;
 };
 
 //  Shares assinging of the OpenMP OffloadModuleInterface and its assorted
@@ -95,7 +98,7 @@ void setOffloadModuleInterfaceAttributes(
     if (Opts.OpenMPIsTargetDevice) {
       offloadMod.setFlags(Opts.OpenMPTargetDebug, Opts.OpenMPTeamSubscription,
           Opts.OpenMPThreadSubscription, Opts.OpenMPNoThreadState,
-          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion);
+          Opts.OpenMPNoNestedParallelism, Opts.OpenMPVersion, Opts.NoGPULib);
 
       if (!Opts.OMPHostIRFile.empty())
         offloadMod.setHostIRFilePath(Opts.OMPHostIRFile);
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index ba2ecab3742587a..ef0db22037283d3 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -881,6 +881,8 @@ static bool parseDialectArgs(CompilerInvocation &res, llvm::opt::ArgList &args,
             args.hasArg(clang::driver::options::OPT_fopenmp_target_debug))
           res.getLangOpts().OpenMPTargetDebug = 1;
       }
+      if (args.hasArg(clang::driver::options::OPT_nogpulib))
+        res.getLangOpts().NoGPULib = 1;
     }
 
     switch (llvm::Triple(res.getTargetOpts().triple).getArch()) {
diff --git a/flang/test/Driver/driver-help-hidden.f90 b/flang/test/Driver/driver-help-hidden.f90
index 6d399f1d179a022..2e65096d0c3fec4 100644
--- a/flang/test/Driver/driver-help-hidden.f90
+++ b/flang/test/Driver/driver-help-hidden.f90
@@ -119,6 +119,7 @@
 ! CHECK-NEXT: --no-offload-arch=<value>
 ! CHECK-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! CHECK-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! CHECK-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! CHECK-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! CHECK-NEXT: --offload-device-only   Only compile for the offloading device.
 ! CHECK-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
diff --git a/flang/test/Driver/driver-help.f90 b/flang/test/Driver/driver-help.f90
index 31c9caa32ea8292..a29e8a0b69db46d 100644
--- a/flang/test/Driver/driver-help.f90
+++ b/flang/test/Driver/driver-help.f90
@@ -107,6 +107,7 @@
 ! HELP-NEXT: --no-offload-arch=<value>
 ! HELP-NEXT:                         Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. 'all' resets the list to its default value.
 ! HELP-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-NEXT: --offload-arch=<value>  Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). If 'native' is used the compiler will detect locally installed architectures. For HIP offloading, the device architecture can be followed by target ID features delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.
 ! HELP-NEXT: --offload-device-only   Only compile for the offloading device.
 ! HELP-NEXT: --offload-host-device   Compile for both the offloading host and device (default).
@@ -239,6 +240,7 @@
 ! HELP-FC1-NEXT: -mvscale-max=<value>    Specify the vscale maximum. Defaults to the vector length agnostic value of "0". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -mvscale-min=<value>    Specify the vscale minimum. Defaults to "1". (AArch64/RISC-V only)
 ! HELP-FC1-NEXT: -nocpp                  Disable predefined and command line preprocessor macros
+! HELP-FC1-NEXT: -nogpulib               Do not link device library for CUDA/HIP device compilation
 ! HELP-FC1-NEXT: -opt-record-file <value>
 ! HELP-FC1-NEXT:                         File name to use for YAML optimization record output
 ! HELP-FC1-NEXT: -opt-record-format <value>
diff --git a/flang/test/Lower/OpenMP/nogpulib.f90 b/flang/test/Lower/OpenMP/nogpulib.f90
new file mode 100644
index 000000000000000..f2e67136ecd7481
--- /dev/null
+++ b/flang/test/Lower/OpenMP/nogpulib.f90
@@ -0,0 +1,12 @@
+!REQUIRES: amdgpu-registered-target
+
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device %s -o - | FileCheck %s
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -o - %s | FileCheck %s
+!RUN: %flang_fc1 -triple amdgcn-amd-amdhsa -emit-hlfir -fopenmp -fopenmp-is-target-device -nogpulib %s -o - | FileCheck %s -check-prefix=FLAG_SET
+!RUN: bbc -fopenmp -fopenmp-is-target-device -fopenmp-is-gpu -emit-hlfir -nogpulib -o - %s | FileCheck %s -check-prefix=FLAG_SET
+
+!CHECK-NOT: module attributes {{{.*}}no_gpu_lib
+!FLAG_SET: module attributes {{{.*}}no_gpu_lib = true
+subroutine omp_subroutine()
+end subroutine omp_subroutine
+
diff --git a/flang/tools/bbc/bbc.cpp b/flang/tools/bbc/bbc.cpp
index 378216bd1e51f8f..a66c41eed59651c 100644
--- a/flang/tools/bbc/bbc.cpp
+++ b/flang/tools/bbc/bbc.cpp
@@ -177,6 +177,12 @@ static llvm::cl::opt<bool> setOpenMPNoNestedParallelism(
                    "a parallel region."),
     llvm::cl::init(false));
 
+static llvm::cl::opt<bool>
+    setNoGPULib("nogpulib",
+                llvm::cl::desc("Do not link device library for CUDA/HIP device "
+                               "compilation"),
+                llvm::cl::init(false));
+
 static llvm::cl::opt<bool> enableOpenACC("fopenacc",
                                          llvm::cl::desc("enable openacc"),
                                          llvm::cl::init(false));
@@ -316,7 +322,7 @@ static mlir::LogicalResult convertFortranSourceToMLIR(
         OffloadModuleOpts(setOpenMPTargetDebug, setOpenMPTeamSubscription,
                           setOpenMPThreadSubscription, setOpenMPNoThreadState,
                           setOpenMPNoNestedParallelism, enableOpenMPDevice,
-                          enableOpenMPGPU, setOpenMPVersion);
+                          enableOpenMPGPU, setOpenMPVersion, "", setNoGPULib);
     setOffloadModuleInterfaceAttributes(mlirModule, offloadModuleOpts);
     setOpenMPVersionAttribute(mlirModule, setOpenMPVersion);
   }
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 72121ad4f9e47a5..ae2dd018059e3aa 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -65,6 +65,7 @@ def FlagsAttr : OpenMP_Attr<"Flags", "flags"> {
     DefaultValuedParameter<"bool", "false">:$assume_threads_oversubscription,
     DefaultValuedParameter<"bool", "false">:$assume_no_thread_state,
     DefaultValuedParameter<"bool", "false">:$assume_no_nested_parallelism,
+    DefaultValuedParameter<"bool", "false">:$no_gpu_lib,
     DefaultValuedParameter<"uint32_t", "50">:$openmp_device_version
   );
 
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
index 77001fc816cf91f..89d04af64766fc2 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
@@ -198,11 +198,12 @@ def OffloadModuleInterface : OpInterface<"OffloadModuleInterface"> {
             "bool":$assumeThreadsOversubscription,
             "bool":$assumeNoThreadState,
             "bool":$assumeNoNestedParallelism,
-            "uint32_t":$openmpDeviceVersion), [{}], [{
+            "uint32_t":$openmpDeviceVersion,
+            "bool":$noGPULib), [{}], [{
         $_op->setAttr(("omp." + mlir::omp::FlagsAttr::getMnemonic()).str(),
                   mlir::omp::FlagsAttr::get($_op->getContext(), debugKind,
                       assumeTeamsOversubscription, assumeThreadsOversubscription,
-                      assumeNoThreadState, assumeNoNestedParallelism, openmpDeviceVersion));
+                      assumeNoThreadState, assumeNoNestedParallelism, noGPULib, openmpDeviceVersion));
       }]>,
     InterfaceMethod<
       /*description=*/[{
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 1daf60b8659bb66..083b507f405f426 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2011,6 +2011,12 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
 
   llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
 
+  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
+                              attribute.getOpenmpDeviceVersion());
+
+  if (attribute.getNoGpuLib())
+    return success();
+
   ompBuilder->createGlobalFlag(
       attribute.getDebugKind() /*LangOpts().OpenMPTargetDebug*/,
       "__omp_rtl_debug_kind");
@@ -2032,8 +2038,6 @@ LogicalResult convertFlagsAttr(Operation *op, mlir::omp::FlagsAttr attribute,
           .getAssumeNoNestedParallelism() /*LangOpts().OpenMPNoNestedParallelism*/
       ,
       "__omp_rtl_assume_no_nested_parallelism");
-  ompBuilder->M.addModuleFlag(llvm::Module::Max, "openmp-device",
-                              attribute.getOpenmpDeviceVersion());
   return success();
 }
 
diff --git a/mlir/test/Dialect/OpenMP/attr.mlir b/mlir/test/Dialect/OpenMP/attr.mlir
index fc89ebd01b6bf57..2e88dac8ff4d2b9 100644
--- a/mlir/test/Dialect/OpenMP/attr.mlir
+++ b/mlir/test/Dialect/OpenMP/attr.mlir
@@ -54,6 +54,12 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 // CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, openmp_device_version = 51>} {
 module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, openmp_device_version = 51>} {}
 
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true, no_gpu_lib = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_no_thread_state = true, assume_teams_oversubscription = true, no_gpu_lib = true, openmp_device_version = 51>} {}
+
+// CHECK: module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, openmp_device_version = 51>} {
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, no_gpu_lib = false, openmp_device_version = 51>} {}
+
 // CHECK: module attributes {omp.version = #omp.version<version = 51>} {
 module attributes {omp.version = #omp.version<version = 51>} {}
 
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 116c05f3747c6ae..8da17107466f817 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -2530,6 +2530,16 @@ module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true,
 
 // -----
 
+// CHECK-NOT: @__omp_rtl_debug_kind = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_teams_oversubscription = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_threads_oversubscription = weak_odr hidden constant i32 0
+// CHECK-NOT: @__omp_rtl_assume_no_thread_state = weak_odr hidden constant i32 1
+// CHECK-NOT: @__omp_rtl_assume_no_nested_parallelism = weak_odr hidden constant i32 0
+module attributes {omp.flags = #omp.flags<assume_teams_oversubscription = true, assume_no_thread_state = true,
+                                          no_gpu_lib=true>} {}
+
+// -----
+
 module attributes {omp.is_target_device = false} {
   // CHECK: define void @filter_nohost
   llvm.func @filter_nohost() -> ()

agozillon · 2024-01-08T14:33:40Z

It would be very nice to have this option added to Flang in the near future, as this option would allow the OpenMP libomptarget runtime tests for offload to be ran without failure due to this missing option. I believe we are currently in the process of setting up a buildbot that includes these tests, it's not a blocker for the buildbot from what I understand, but it'd be very nice to be able to test for success of these tests on commits in the near future, especially as the suite expands.

jplehr · 2024-01-08T14:46:23Z

We have the buildbot now up in staging: https://lab.llvm.org/staging/#/builders/188

doru1004

LG

agozillon

the PR LGTM.

If -nogpulib option is passed by the user, then the OpenMP device runtime is not used and we should not emit globals to configure debugging at compile-time for the device runtime. Link to -nogpulib flag implementation for Clang: https://reviews.llvm.org/D125314

DominikAdamski added 2 commits November 2, 2023 05:09

DominikAdamski requested review from skatrak, jsjodin, banach-space, kiranchandramohan, agozillon, dpalermo and kiranktp November 2, 2023 10:54

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' mlir:llvm flang:driver mlir flang Flang issues not falling into any other category mlir:openmp flang:fir-hlfir flang:openmp labels Nov 2, 2023

ergawy mentioned this pull request Jan 2, 2024

[OpenMP][Offloading][AMDGPU] Exclude nogpulib in lit.cfg #76702

Closed

doru1004 self-requested a review January 9, 2024 16:33

doru1004 approved these changes Jan 9, 2024

View reviewed changes

agozillon approved these changes Jan 9, 2024

View reviewed changes

DominikAdamski merged commit f443fbc into llvm:main Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flang][OpenMP][MLIR] Add support for -nogpulib option #71045

[Flang][OpenMP][MLIR] Add support for -nogpulib option #71045

Uh oh!

DominikAdamski commented Nov 2, 2023

Uh oh!

llvmbot commented Nov 2, 2023 •

edited

Loading

Uh oh!

llvmbot commented Nov 2, 2023

Uh oh!

llvmbot commented Nov 2, 2023

Uh oh!

agozillon commented Jan 8, 2024

Uh oh!

jplehr commented Jan 8, 2024

Uh oh!

doru1004 left a comment

Uh oh!

agozillon left a comment

Uh oh!

Uh oh!

[Flang][OpenMP][MLIR] Add support for -nogpulib option #71045

[Flang][OpenMP][MLIR] Add support for -nogpulib option #71045

Uh oh!

Conversation

DominikAdamski commented Nov 2, 2023

Uh oh!

llvmbot commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 2, 2023

Uh oh!

llvmbot commented Nov 2, 2023

Uh oh!

agozillon commented Jan 8, 2024

Uh oh!

jplehr commented Jan 8, 2024

Uh oh!

doru1004 left a comment

Choose a reason for hiding this comment

Uh oh!

agozillon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Nov 2, 2023 •

edited

Loading