|
| 1 | +OpenMP Command-Line Argument Reference |
| 2 | +====================================== |
| 3 | +Welcome to the OpenMP in LLVM command line argument reference. The content is |
| 4 | +not a complete list of arguments but includes the essential command-line |
| 5 | +arguments you may need when compiling and linking OpenMP. |
| 6 | +Section :ref:`general_command_line_arguments` lists OpenMP command line options |
| 7 | +for multicore programming while :ref:`offload_command_line_arguments` lists |
| 8 | +options relevant to OpenMP target offloading. |
| 9 | + |
| 10 | +.. _general_command_line_arguments: |
| 11 | + |
| 12 | +OpenMP Command-Line Arguments |
| 13 | +----------------------------- |
| 14 | + |
| 15 | +``-fopenmp`` |
| 16 | +^^^^^^^^^^^^ |
| 17 | +Enable the OpenMP compilation toolchain. The compiler will parse OpenMP |
| 18 | +compiler directives and generate parallel code. |
| 19 | + |
| 20 | +``-fopenmp-extensions`` |
| 21 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 22 | +Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of |
| 23 | +current extensions and their implementation status can be found on the |
| 24 | +`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_ |
| 25 | +page. |
| 26 | + |
| 27 | +``-fopenmp-simd`` |
| 28 | +^^^^^^^^^^^^^^^^^ |
| 29 | +This option enables OpenMP only for single instruction, multiple data |
| 30 | +(SIMD) constructs. |
| 31 | + |
| 32 | +``-static-openmp`` |
| 33 | +^^^^^^^^^^^^^^^^^^ |
| 34 | +Use the static OpenMP host runtime while linking. |
| 35 | + |
| 36 | +``-fopenmp-version=<arg>`` |
| 37 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 38 | +Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard. |
| 39 | +For example, you may use ``-fopenmp-version=45`` to select version 4.5 of |
| 40 | +the OpenMP standard. The default value is ``-fopenmp-version=50`` for ``Clang`` |
| 41 | +and ``-fopenmp-version=11`` for ``flang-new``. |
| 42 | + |
| 43 | +.. _offload_command_line_arguments: |
| 44 | + |
| 45 | +Offloading Specific Command-Line Arguments |
| 46 | +------------------------------------------ |
| 47 | + |
| 48 | +.. _fopenmp-targets: |
| 49 | + |
| 50 | +``-fopenmp-targets`` |
| 51 | +^^^^^^^^^^^^^^^^^^^^ |
| 52 | +| Specify which OpenMP offloading targets should be supported. For example, you |
| 53 | + may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is |
| 54 | + often optional when :ref:`offload_arch` is provided. |
| 55 | +| It is also possible to offload to CPU architectures, for instance with |
| 56 | + ``-fopenmp-targets=x86_64-pc-linux-gnu``. |
| 57 | +
|
| 58 | +.. _offload_arch: |
| 59 | + |
| 60 | +``--offload-arch`` |
| 61 | +^^^^^^^^^^^^^^^^^^ |
| 62 | +| Specify the device architecture for OpenMP offloading. For instance |
| 63 | + ``--offload-arch=sm_80`` to target an Nvidia Tesla A100, |
| 64 | + ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or |
| 65 | + ``--offload-arch=sm_80,gfx90a`` to target both. |
| 66 | +| It is also possible to specify :ref:`fopenmp-targets` without specifying |
| 67 | + ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or |
| 68 | + ``nvptx-arch`` will be executed as part of the compiler driver to |
| 69 | + detect the device arhitecture automatically. |
| 70 | +| Finally, the device architecture will also be automatically inferred with |
| 71 | + ``--offload-arch=native``. |
| 72 | +
|
| 73 | +``--offload-device-only`` |
| 74 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 75 | +Compile only the code that goes on the device. This option is mainly for |
| 76 | +debugging purposes. It is primarily used for inspecting the intermediate |
| 77 | +representation (IR) output when compiling for the device. It may also be used |
| 78 | +if device-only runtimes are created. |
| 79 | + |
| 80 | +``--offload-host-only`` |
| 81 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 82 | +Compile only the code that goes on the host. With this option enabled, the |
| 83 | +``.llvm.offloading`` section with embedded device code will not be included in |
| 84 | +the intermediate representation. |
| 85 | + |
| 86 | +``--offload-host-device`` |
| 87 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 88 | +Compile the target regions for both the host and the device. That is the |
| 89 | +default option. |
| 90 | + |
| 91 | +``-Xopenmp-target <arg>`` |
| 92 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 93 | +Pass an argument ``<arg>`` to the offloading toolchain, for instance |
| 94 | +``-Xopenmp-target -march=sm_80``. |
| 95 | + |
| 96 | +``-Xopenmp-target=<triple> <arg>`` |
| 97 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 98 | +Pass an argument ``<arg>`` to the offloading toolchain for the target |
| 99 | +``<triple>``. That is especially useful when an argument must differ for each |
| 100 | +triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80 |
| 101 | +-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device |
| 102 | +architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can |
| 103 | +pass an argument to the host and device compilation toolchain. |
| 104 | + |
| 105 | +``-Xoffload-linker<triple> <arg>`` |
| 106 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 107 | +Pass an argument ``<arg>`` to the offloading linker for the target specified in |
| 108 | +``<triple>``. |
| 109 | + |
| 110 | +.. _Xarch_device: |
| 111 | + |
| 112 | +``-Xarch_device <arg>`` |
| 113 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 114 | +Pass an argument ``<arg>`` to the device compilation toolchain. |
| 115 | + |
| 116 | +.. _Xarch_host: |
| 117 | + |
| 118 | +``-Xarch_host <arg>`` |
| 119 | +^^^^^^^^^^^^^^^^^^^^^ |
| 120 | +Pass an argument ``<arg>`` to the host compilation toolchain. |
| 121 | + |
| 122 | +``-foffload-lto[=<arg>]`` |
| 123 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 124 | +Enable device link time optimization (LTO) and select the LTO mode ``<arg>``. |
| 125 | +Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes |
| 126 | +less time while still achieving some performance gains. If no argument is set, |
| 127 | +this option defaults to ``-foffload-lto=full``. |
| 128 | + |
| 129 | +``-fopenmp-offload-mandatory`` |
| 130 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 131 | +| This option is set to avoid generating the host fallback code |
| 132 | + executed when offloading to the device fails. That is |
| 133 | + helpful when the target contains code that cannot be compiled for the host, for |
| 134 | + instance, if it contains unguarded device intrinsics. |
| 135 | +| This option can also be used to reduce compile time. |
| 136 | +| This option should not be used when one wants to verify that the code is being |
| 137 | + offloaded to the device. Instead, set the environment variable |
| 138 | + ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to |
| 139 | + the device. |
| 140 | +
|
| 141 | +``-fopenmp-target-debug[=<arg>]`` |
| 142 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 143 | +Enable debugging in the device runtime library (RTL). Note that it is both |
| 144 | +necessary to configure the debugging in the device runtime at compile-time with |
| 145 | +``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the |
| 146 | +environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is |
| 147 | +currently only supported for Nvidia targets as of July 2023. Alternatively, the |
| 148 | +environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and |
| 149 | +AMD GPU targets. For more information, see the |
| 150 | +`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_. |
| 151 | +The debugging instructions list the supported debugging arguments. |
| 152 | + |
| 153 | +``-fopenmp-target-jit`` |
| 154 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 155 | +| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed |
| 156 | + LLVM-IR for the device code in the object files rather than binary code for the |
| 157 | + respective target. At runtime, the LLVM-IR is optimized again and compiled for |
| 158 | + the target device. The optimization level can be set at runtime with |
| 159 | + ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance, |
| 160 | + ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``. |
| 161 | + See the |
| 162 | + `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_ |
| 163 | + for instructions on extracting the embedded device code before or after the |
| 164 | + JIT and more. |
| 165 | +| We want to emphasize that JIT for OpenMP offloading is good for debugging as |
| 166 | + the target IR can be extracted, modified, and injected at runtime. |
| 167 | +
|
| 168 | +``--offload-new-driver`` |
| 169 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 170 | +In upstream LLVM, OpenMP only uses the new driver. However, enabling this |
| 171 | +option for experimental linking with CUDA or HIP files is necessary. |
| 172 | + |
| 173 | +``--offload-link`` |
| 174 | +^^^^^^^^^^^^^^^^^^ |
| 175 | +Use the new offloading linker `clang-linker-wrapper` to perform the link job. |
| 176 | +`clang-linker-wrapper` is the default offloading linker for OpenMP. This option |
| 177 | +can be used to use the new offloading linker in toolchains that do not automatically |
| 178 | +use it. It is necessary to enable this option when linking with CUDA or HIP files. |
| 179 | + |
| 180 | +``-nogpulib`` |
| 181 | +^^^^^^^^^^^^^ |
| 182 | +Do not link the device library for CUDA or HIP device compilation. |
| 183 | + |
| 184 | +``-nogpuinc`` |
| 185 | +^^^^^^^^^^^^^ |
| 186 | +Do not include the default CUDA or HIP headers, and do not add CUDA or HIP |
| 187 | +include paths. |
0 commit comments