16
16
OpenMP Support
17
17
==============
18
18
19
- Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
20
- PPC64[LE] and has `basic support for Cuda devices `_.
21
-
22
- * #pragma omp declare simd: :part: `Partial `. We support parsing/semantic
23
- analysis + generation of special attributes for X86 target, but still
24
- missing the LLVM pass for vectorization.
19
+ Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2.
20
+ Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).
25
21
26
22
In addition, the LLVM OpenMP runtime `libomp ` supports the OpenMP Tools
27
23
Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
24
+ OMPT is also supported for NVIDIA and AMD GPUs.
28
25
29
26
For the list of supported features from OpenMP 5.0 and 5.1
30
27
see `OpenMP implementation details `_ and `OpenMP 51 implementation details `_.
@@ -36,43 +33,17 @@ General improvements
36
33
collapse clause by replacing the expensive remainder operation with
37
34
multiplications and additions.
38
35
39
- - The default schedules for the `distribute ` and `for ` constructs in a
40
- parallel region and in SPMD mode have changed to ensure coalesced
41
- accesses. For the `distribute ` construct, a static schedule is used
42
- with a chunk size equal to the number of threads per team (default
43
- value of threads or as specified by the `thread_limit ` clause if
44
- present). For the `for ` construct, the schedule is static with chunk
45
- size of one.
46
-
47
- - Simplified SPMD code generation for `distribute parallel for ` when
48
- the new default schedules are applicable.
49
-
50
36
- When using the collapse clause on a loop nest the default behavior
51
37
is to automatically extend the representation of the loop counter to
52
38
64 bits for the cases where the sizes of the collapsed loops are not
53
39
known at compile time. To prevent this conservative choice and use
54
40
at most 32 bits, compile your program with the
55
41
`-fopenmp-optimistic-collapse `.
56
42
57
- .. _basic support for Cuda devices :
58
43
59
- Cuda devices support
44
+ GPU devices support
60
45
====================
61
46
62
- Directives execution modes
63
- --------------------------
64
-
65
- Clang code generation for target regions supports two modes: the SPMD and
66
- non-SPMD modes. Clang chooses one of these two modes automatically based on the
67
- way directives and clauses on those directives are used. The SPMD mode uses a
68
- simplified set of runtime functions thus increasing performance at the cost of
69
- supporting some OpenMP features. The non-SPMD mode is the most generic mode and
70
- supports all currently available OpenMP features. The compiler will always
71
- attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:
72
-
73
- - The target region contains user code (other than OpenMP-specific
74
- directives) in between the `target ` and the `parallel ` directives.
75
-
76
47
Data-sharing modes
77
48
------------------
78
49
@@ -82,8 +53,9 @@ performance and can be activated using the `-fopenmp-cuda-mode` flag. In
82
53
`Generic ` mode all local variables that can be shared in the parallel regions
83
54
are stored in the global memory. In `Cuda ` mode local variables are not shared
84
55
between the threads and it is user responsibility to share the required data
85
- between the threads in the parallel regions.
86
-
56
+ between the threads in the parallel regions. Often, the optimizer is able to
57
+ reduce the cost of `Generic ` mode to the level of `Cuda ` mode, but the flag,
58
+ as well as other assumption flags, can be used for tuning.
87
59
88
60
Features not supported or with limited support for Cuda devices
89
61
---------------------------------------------------------------
@@ -96,9 +68,6 @@ Features not supported or with limited support for Cuda devices
96
68
97
69
- Nested parallelism: inner parallel regions are executed sequentially.
98
70
99
- - Automatic translation of math functions in target regions to device-specific
100
- math functions is not implemented yet.
101
-
102
71
- Debug information for OpenMP target regions is supported, but sometimes it may
103
72
be required to manually specify the address class of the inspected variables.
104
73
In some cases the local variables are actually allocated in the global memory,
@@ -139,7 +108,7 @@ implementation.
139
108
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
140
109
| memory management | allocate directive and allocate clause | :good: `done ` | r355614,r335952 |
141
110
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
142
- | OMPD | OMPD interfaces | :part : `done ` | https://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux |
111
+ | OMPD | OMPD interfaces | :good : `done ` | https://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux |
143
112
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
144
113
| OMPT | OMPT interfaces (callback support) | :good: `done ` | |
145
114
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
@@ -171,7 +140,7 @@ implementation.
171
140
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
172
141
| device | infer target functions from initializers | :part: `worked on ` | |
173
142
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
174
- | device | infer target variables from initializers | :part : `done ` | D146418 |
143
+ | device | infer target variables from initializers | :good : `done ` | D146418 |
175
144
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
176
145
| device | OMP_TARGET_OFFLOAD environment variable | :good: `done ` | D50522 |
177
146
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
@@ -217,7 +186,7 @@ implementation.
217
186
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
218
187
| device | support close modifier on map clause | :good: `done ` | D55719,D55892 |
219
188
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
220
- | device | teams construct on the host device | :part : `done ` | r371553 |
189
+ | device | teams construct on the host device | :good : `done ` | r371553 |
221
190
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
222
191
| device | support non-contiguous array sections for target update | :good: `done ` | |
223
192
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
@@ -235,15 +204,15 @@ implementation.
235
204
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
236
205
| misc | library shutdown (omp_pause_resource[_all]) | :good: `done ` | D55078 |
237
206
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
238
- | misc | metadirectives | :part: `worked on ` | D91944 |
207
+ | misc | metadirectives | :part: `mostly done ` | D91944 |
239
208
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
240
209
| misc | conditional modifier for lastprivate clause | :good: `done ` | |
241
210
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
242
211
| misc | iterator and multidependences | :good: `done ` | |
243
212
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
244
213
| misc | depobj directive and depobj dependency kind | :good: `done ` | |
245
214
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
246
- | misc | user-defined function variants | :part: ` worked on ` | D67294, D64095, D71847, D71830, D109635 |
215
+ | misc | user-defined function variants | :good: ` done `. | D67294, D64095, D71847, D71830, D109635 |
247
216
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
248
217
| misc | pointer/reference to pointer based array reductions | :good: `done ` | |
249
218
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
@@ -298,7 +267,7 @@ implementation.
298
267
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
299
268
| device | indirect clause on declare target directive | :none: `unclaimed ` | |
300
269
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
301
- | device | allow virtual functions calls for mapped object on device | :none: ` unclaimed ` | |
270
+ | device | allow virtual functions calls for mapped object on device | :part: ` partial ` | |
302
271
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
303
272
| device | interop construct | :part: `partial ` | parsing/sema done: D98558, D98834, D98815 |
304
273
+------------------------------+--------------------------------------------------------------+--------------------------+-----------------------------------------------------------------------+
0 commit comments