Skip to content

Commit 5c0f98c

Browse files
committed
[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
This command adds an OpenMP offloading specific command line reference. The OpenMP FAQ links to the .rst new file. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D156387
1 parent 239777c commit 5c0f98c

File tree

3 files changed

+246
-9
lines changed

3 files changed

+246
-9
lines changed
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
OpenMP Command-Line Argument Reference
2+
======================================
3+
Welcome to the OpenMP in LLVM command line argument reference. The content is
4+
not a complete list of arguments but includes the essential command-line
5+
arguments you may need when compiling and linking OpenMP.
6+
Section :ref:`general_command_line_arguments` lists OpenMP command line options
7+
for multicore programming while :ref:`offload_command_line_arguments` lists
8+
options relevant to OpenMP target offloading.
9+
10+
.. _general_command_line_arguments:
11+
12+
OpenMP Command-Line Arguments
13+
-----------------------------
14+
15+
``-fopenmp``
16+
^^^^^^^^^^^^
17+
Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
18+
compiler directives and generate parallel code.
19+
20+
``-fopenmp-extensions``
21+
^^^^^^^^^^^^^^^^^^^^^^^
22+
Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
23+
current extensions and their implementation status can be found on the
24+
`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
25+
page.
26+
27+
``-fopenmp-simd``
28+
^^^^^^^^^^^^^^^^^
29+
This option enables OpenMP only for single instruction, multiple data
30+
(SIMD) constructs.
31+
32+
``-static-openmp``
33+
^^^^^^^^^^^^^^^^^^
34+
Use the static OpenMP host runtime while linking.
35+
36+
``-fopenmp-version=<arg>``
37+
^^^^^^^^^^^^^^^^^^^^^^^^^^
38+
Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
39+
For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
40+
the OpenMP standard. The default value is ``-fopenmp-version=50`` for ``Clang``
41+
and ``-fopenmp-version=11`` for ``flang-new``.
42+
43+
.. _offload_command_line_arguments:
44+
45+
Offloading Specific Command-Line Arguments
46+
------------------------------------------
47+
48+
.. _fopenmp-targets:
49+
50+
``-fopenmp-targets``
51+
^^^^^^^^^^^^^^^^^^^^
52+
| Specify which OpenMP offloading targets should be supported. For example, you
53+
may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
54+
often optional when :ref:`offload_arch` is provided.
55+
| It is also possible to offload to CPU architectures, for instance with
56+
``-fopenmp-targets=x86_64-pc-linux-gnu``.
57+
58+
.. _offload_arch:
59+
60+
``--offload-arch``
61+
^^^^^^^^^^^^^^^^^^
62+
| Specify the device architecture for OpenMP offloading. For instance
63+
``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
64+
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
65+
``--offload-arch=sm_80,gfx90a`` to target both.
66+
| It is also possible to specify :ref:`fopenmp-targets` without specifying
67+
``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
68+
``nvptx-arch`` will be executed as part of the compiler driver to
69+
detect the device arhitecture automatically.
70+
| Finally, the device architecture will also be automatically inferred with
71+
``--offload-arch=native``.
72+
73+
``--offload-device-only``
74+
^^^^^^^^^^^^^^^^^^^^^^^^^
75+
Compile only the code that goes on the device. This option is mainly for
76+
debugging purposes. It is primarily used for inspecting the intermediate
77+
representation (IR) output when compiling for the device. It may also be used
78+
if device-only runtimes are created.
79+
80+
``--offload-host-only``
81+
^^^^^^^^^^^^^^^^^^^^^^^
82+
Compile only the code that goes on the host. With this option enabled, the
83+
``.llvm.offloading`` section with embedded device code will not be included in
84+
the intermediate representation.
85+
86+
``--offload-host-device``
87+
^^^^^^^^^^^^^^^^^^^^^^^^^
88+
Compile the target regions for both the host and the device. That is the
89+
default option.
90+
91+
``-Xopenmp-target <arg>``
92+
^^^^^^^^^^^^^^^^^^^^^^^^^
93+
Pass an argument ``<arg>`` to the offloading toolchain, for instance
94+
``-Xopenmp-target -march=sm_80``.
95+
96+
``-Xopenmp-target=<triple> <arg>``
97+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
98+
Pass an argument ``<arg>`` to the offloading toolchain for the target
99+
``<triple>``. That is especially useful when an argument must differ for each
100+
triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
101+
-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
102+
architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
103+
pass an argument to the host and device compilation toolchain.
104+
105+
``-Xoffload-linker<triple> <arg>``
106+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
107+
Pass an argument ``<arg>`` to the offloading linker for the target specified in
108+
``<triple>``.
109+
110+
.. _Xarch_device:
111+
112+
``-Xarch_device <arg>``
113+
^^^^^^^^^^^^^^^^^^^^^^^
114+
Pass an argument ``<arg>`` to the device compilation toolchain.
115+
116+
.. _Xarch_host:
117+
118+
``-Xarch_host <arg>``
119+
^^^^^^^^^^^^^^^^^^^^^
120+
Pass an argument ``<arg>`` to the host compilation toolchain.
121+
122+
``-foffload-lto[=<arg>]``
123+
^^^^^^^^^^^^^^^^^^^^^^^^^
124+
Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
125+
Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
126+
less time while still achieving some performance gains. If no argument is set,
127+
this option defaults to ``-foffload-lto=full``.
128+
129+
``-fopenmp-offload-mandatory``
130+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
131+
| This option is set to avoid generating the host fallback code
132+
executed when offloading to the device fails. That is
133+
helpful when the target contains code that cannot be compiled for the host, for
134+
instance, if it contains unguarded device intrinsics.
135+
| This option can also be used to reduce compile time.
136+
| This option should not be used when one wants to verify that the code is being
137+
offloaded to the device. Instead, set the environment variable
138+
``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
139+
the device.
140+
141+
``-fopenmp-target-debug[=<arg>]``
142+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
143+
Enable debugging in the device runtime library (RTL). Note that it is both
144+
necessary to configure the debugging in the device runtime at compile-time with
145+
``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
146+
environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
147+
currently only supported for Nvidia targets as of July 2023. Alternatively, the
148+
environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
149+
AMD GPU targets. For more information, see the
150+
`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
151+
The debugging instructions list the supported debugging arguments.
152+
153+
``-fopenmp-target-jit``
154+
^^^^^^^^^^^^^^^^^^^^^^^
155+
| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
156+
LLVM-IR for the device code in the object files rather than binary code for the
157+
respective target. At runtime, the LLVM-IR is optimized again and compiled for
158+
the target device. The optimization level can be set at runtime with
159+
``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
160+
``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
161+
See the
162+
`OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
163+
for instructions on extracting the embedded device code before or after the
164+
JIT and more.
165+
| We want to emphasize that JIT for OpenMP offloading is good for debugging as
166+
the target IR can be extracted, modified, and injected at runtime.
167+
168+
``--offload-new-driver``
169+
^^^^^^^^^^^^^^^^^^^^^^^^
170+
In upstream LLVM, OpenMP only uses the new driver. However, enabling this
171+
option for experimental linking with CUDA or HIP files is necessary.
172+
173+
``--offload-link``
174+
^^^^^^^^^^^^^^^^^^
175+
Use the new offloading linker `clang-linker-wrapper` to perform the link job.
176+
`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
177+
can be used to use the new offloading linker in toolchains that do not automatically
178+
use it. It is necessary to enable this option when linking with CUDA or HIP files.
179+
180+
``-nogpulib``
181+
^^^^^^^^^^^^^
182+
Do not link the device library for CUDA or HIP device compilation.
183+
184+
``-nogpuinc``
185+
^^^^^^^^^^^^^
186+
Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
187+
include paths.

openmp/docs/SupportAndFAQ.rst

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,15 @@ All patches go through the regular `LLVM review process
5252
Q: How to build an OpenMP GPU offload capable compiler?
5353
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5454
To build an *effective* OpenMP offload capable compiler, only one extra CMake
55-
option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
55+
option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic
5656
information about building LLVM is available `here
57-
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
58-
are targeted by OpenMP to be enabled. By default, Clang will be built with all
59-
backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
60-
should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
61-
default.
57+
<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
58+
are targeted by OpenMP are enabled. That can be done by adjusting the CMake
59+
option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
60+
and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
61+
Clang will be built with all backends enabled. When building with
62+
``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
63+
``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
6264

6365
For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
6466
For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
@@ -72,14 +74,17 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
7274

7375
.. _build_nvidia_offload_capable_compiler:
7476

75-
Q: How to build an OpenMP NVidia offload capable compiler?
77+
Q: How to build an OpenMP Nvidia offload capable compiler?
7678
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7779
The Cuda SDK is required on the machine that will execute the openmp application.
7880

7981
If your build machine is not the target machine or automatic detection of the
8082
available GPUs failed, you should also set:
8183

82-
- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
84+
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_<xy>,...`` where ``<xy>`` is the numeric
85+
compute capability of your GPU. For instance, set
86+
``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,sm_80`` to target the Nvidia Volta
87+
and Ampere architectures.
8388

8489

8590
.. _build_amdgpu_offload_capable_compiler:
@@ -133,6 +138,14 @@ With those libraries installed, then LLVM build and installed, try:
133138
134139
clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
135140
141+
If your build machine is not the target machine or automatic detection of the
142+
available GPUs failed, you should also set:
143+
144+
- ``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx<xyz>,...`` where ``<xyz>`` is the
145+
shader core instruction set architecture. For instance, set
146+
``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx906,gfx90a`` to target AMD GCN5
147+
and CDNA2 devices.
148+
136149
Q: What are the known limitations of OpenMP AMDGPU offload?
137150
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138151
LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
@@ -349,7 +362,7 @@ create generic libraries.
349362
The architecture can either be specified manually using ``--offload-arch=``. If
350363
``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
351364
targets will be inferred from the architectures. Conversely, if
352-
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
365+
``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
353366
architecture will be set to a default value, usually the architecture supported
354367
by the system LLVM was built on.
355368

@@ -451,3 +464,25 @@ with OpenMP.
451464
452465
For more information on how this is implemented in LLVM/OpenMP's offloading
453466
runtime, refer to the `runtime documentation <libomptarget_libc>`_.
467+
468+
Q: What command line options can I use for OpenMP?
469+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
470+
We recommend taking a look at the OpenMP
471+
:doc:`command line argument reference <CommandLineArgumentReference>` page.
472+
473+
Q: Why is my build taking a long time?
474+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
475+
When installing OpenMP and other LLVM components, the build time on multicore
476+
systems can be significantly reduced with parallel build jobs. As suggested in
477+
*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
478+
generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
479+
use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
480+
time can also be reduced by setting the build type to ``Release`` with the
481+
``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
482+
compilations. Consider enabling ``Ccache`` with
483+
``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.
484+
485+
Q: Did this FAQ not answer your question?
486+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
487+
Feel free to post questions or browse old threads at
488+
`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.

openmp/docs/index.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,21 @@ please refer to :doc:`remarks/OptimizationRemarks`.
9191

9292
remarks/OptimizationRemarks
9393

94+
OpenMP Command-Line Argument Reference
95+
======================================
96+
In addition to the
97+
`Clang command-line argument reference <https://clang.llvm.org/docs/ClangCommandLineReference.html>`_
98+
we also recommend the OpenMP
99+
:doc:`command-line argument reference <CommandLineArgumentReference>`
100+
page that offers a detailed overview of options specific to OpenMP. It also
101+
contains a list of OpenMP offloading related command-line arguments.
102+
103+
104+
.. toctree::
105+
:hidden:
106+
:maxdepth: 1
107+
108+
CommandLineArgumentReference
94109

95110
Support, Getting Involved, and Frequently Asked Questions (FAQ)
96111
===============================================================

0 commit comments

Comments
 (0)