|
| 1 | +# `device_global` Sample |
| 2 | +This FPGA tutorial explains how to use `device_global` class as a way of keeping a state between multiple invocations of a kernel. |
| 3 | +| Area | Description |
| 4 | +|:--- |:--- |
| 5 | +| What you will learn | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values<br> |
| 6 | +| Time to complete | 15 minutes |
| 7 | +| Category | Features |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +This sample is part of the FPGA code samples. |
| 12 | +It is categorized as a Tier 2 sample that demonstrates a compiler feature. |
| 13 | + |
| 14 | +```mermaid |
| 15 | +flowchart LR |
| 16 | + tier1("Tier 1: Get Started") |
| 17 | + tier2("Tier 2: Explore the Fundamentals") |
| 18 | + tier3("Tier 3: Explore the Advanced Techniques") |
| 19 | + tier4("Tier 4: Explore the Reference Designs") |
| 20 | + |
| 21 | + tier1 --> tier2 --> tier3 --> tier4 |
| 22 | + |
| 23 | + style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff |
| 24 | + style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff |
| 25 | + style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff |
| 26 | + style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff |
| 27 | +``` |
| 28 | + |
| 29 | +Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/DPC++FPGA/README.md). |
| 30 | +You can also find more information about [troubleshooting build errors](/DirectProgramming/DPC++FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/DPC++FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/DPC++FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/DPC++FPGA/README.md#documentation), etc. |
| 31 | + |
| 32 | +| Optimized for | Description |
| 33 | +|:--- |:--- |
| 34 | +| OS | Ubuntu* 18.04/20.04 <br> RHEL*/CentOS* 8 <br> SUSE* 15 <br> Windows* 10 |
| 35 | +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs |
| 36 | +| Software | Intel® oneAPI DPC++/C++ Compiler |
| 37 | + |
| 38 | +> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. |
| 39 | +> |
| 40 | +> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH: |
| 41 | +> - Questa*-Intel® FPGA Edition |
| 42 | +> - Questa*-Intel® FPGA Starter Edition |
| 43 | +> - ModelSim® SE |
| 44 | +> |
| 45 | +> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. |
| 46 | +> |
| 47 | +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. |
| 48 | +
|
| 49 | +## Purpose |
| 50 | +This tutorial demonstrates a simple example of initializing a `device_global` class to a non-zero value, and using it to keep state between multiple re-launches of a kernel. |
| 51 | + |
| 52 | +### Description of `device_global` |
| 53 | +The `device_global` class is an extension that introduces device scoped memory allocations into SYCL that can be accessed within a kernel using syntax similar to C++ global variables, but that have unique instances per `sycl::device`. Similar to C++ global variables, a `device_global` variable have a namespace scope and is visible to all kernels within that scope. |
| 54 | + |
| 55 | +A `device_global` class is instantiated from a class template. The template is parameterized by the type of the underlying allocation, and a list of properties. The type of the allocation also encodes the size of the allocation for potentially multidimensional array types. The list of properties change the functional behavior of the `device_global` instance to enable compiler and runtime optimizations. In the code sample two properties are used, `device_image_scope` and `host_access_none`. The `device_image_scope` property limits the scope of a single instance of a `device_global` from a device to a `device_image`. The absence of the `device_image_scope` property is currently not supported by the compiler. The `host_access` property is an assertion by the user telling the implementation whether the host code copies to or from the `device_global`. The property comes in four variants `host_access_none`, `host_access_read`, `host_access_write`, and `host_access_read_write`(the default), but only the `host_access_none` is supported currently. Further details on these and other properties can be found in the [Properties for `device_global` variables](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables) section of the [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) |
| 56 | + |
| 57 | +A `device_global` instance can be used to store state across multiple relaunches of a kernel without having to pass in a `buffer` as a kernel argument. An example of an application that would benefit from such a state is where kernels are nodes in a state-machine. |
| 58 | + |
| 59 | +### How to initialize a `device_global` instance |
| 60 | +A `device_global` instance is always zero-initialized. If you need the first usage of a `device_global` instance in device code to be initialized to a non-zero value, use the following technique: |
| 61 | + |
| 62 | +> Instantiate a second `device_global<bool>` that represents a flag that controls when to initialize the `device_global` to a non-zero value. This flag is zero-initialized to `false`. Once initialization happens, the flag is set to `true` and initialization code doesn't execute on subsequent relaunches of the kernel. |
| 63 | +```cpp |
| 64 | +namespace exp = sycl::ext::oneapi::experimental; |
| 65 | +using FPGAProperties = decltype(exp::properties( |
| 66 | + exp::device_image_scope, exp::host_access_none)); |
| 67 | +exp::device_global<int, FPGAProperties> val; |
| 68 | +exp::device_global<bool, FPGAProperties> is_val_initialized; |
| 69 | +int main () { |
| 70 | + sycl::queue q; |
| 71 | + q.submit([&](sycl::handler& h) { |
| 72 | + h.single_task([=] { |
| 73 | + // Initialization happens only once |
| 74 | + if (!is_val_initialized) { |
| 75 | + val.get() = 42; |
| 76 | + is_val_initialized.get() = true; |
| 77 | + } |
| 78 | + // uses of `val` |
| 79 | + }); |
| 80 | + }); |
| 81 | +} |
| 82 | +``` |
| 83 | +
|
| 84 | +### Additional Documentation |
| 85 | +- [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) helps understand the API and of `device_globals` and restrictions on it's usage |
| 86 | +
|
| 87 | +## Key Concepts |
| 88 | +* The basic usage of the `device_global` class |
| 89 | +* How initialize a `device_global` to non-zero value |
| 90 | +
|
| 91 | +## Building the `device_global` Tutorial |
| 92 | +> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. |
| 93 | +> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window. |
| 94 | +> This practice ensures that your compiler, libraries, and tools are ready for development. |
| 95 | +> |
| 96 | +> Linux*: |
| 97 | +> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` |
| 98 | +> - For private installations: ` . ~/intel/oneapi/setvars.sh` |
| 99 | +> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'` |
| 100 | +> |
| 101 | +> Windows*: |
| 102 | +> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat` |
| 103 | +> - Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'` |
| 104 | +> |
| 105 | +> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). |
| 106 | +
|
| 107 | +### On a Linux* System |
| 108 | +
|
| 109 | +1. Generate the `Makefile` by running `cmake`. |
| 110 | +``` |
| 111 | + mkdir build |
| 112 | + cd build |
| 113 | + ``` |
| 114 | + To compile for the default target (the Agilex™ device family), run `cmake` using the command: |
| 115 | + ``` |
| 116 | + cmake .. |
| 117 | + ``` |
| 118 | +
|
| 119 | + > **Note**: You can change the default target by using the command: |
| 120 | + > ``` |
| 121 | + > cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number> |
| 122 | + > ``` |
| 123 | + > |
| 124 | + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: |
| 125 | + > ``` |
| 126 | + > cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant> |
| 127 | + > ``` |
| 128 | + > |
| 129 | + > You will only be able to run an executable on the FPGA if you specified a BSP. |
| 130 | +
|
| 131 | +2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: |
| 132 | +
|
| 133 | + * Compile for emulation (fast compile time, targets emulated FPGA device): |
| 134 | + ``` |
| 135 | + make fpga_emu |
| 136 | + ``` |
| 137 | + * Generate the optimization report: |
| 138 | + ``` |
| 139 | + make report |
| 140 | + ``` |
| 141 | + * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): |
| 142 | + ``` |
| 143 | + make fpga_sim |
| 144 | + ``` |
| 145 | + * Compile for FPGA hardware (longer compile time, targets FPGA device): |
| 146 | + ``` |
| 147 | + make fpga |
| 148 | + ``` |
| 149 | +
|
| 150 | +### On a Windows* System |
| 151 | +
|
| 152 | +1. Generate the `Makefile` by running `cmake`. |
| 153 | + ``` |
| 154 | + mkdir build |
| 155 | + cd build |
| 156 | + ``` |
| 157 | + To compile for the default target (the Agilex™ device family), run `cmake` using the command: |
| 158 | + ``` |
| 159 | + cmake -G "NMake Makefiles" .. |
| 160 | + ``` |
| 161 | + > **Note**: You can change the default target by using the command: |
| 162 | + > ``` |
| 163 | + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number> |
| 164 | + > ``` |
| 165 | + > |
| 166 | + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: |
| 167 | + > ``` |
| 168 | + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant> |
| 169 | + > ``` |
| 170 | + > |
| 171 | + > You will only be able to run an executable on the FPGA if you specified a BSP. |
| 172 | +
|
| 173 | +2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: |
| 174 | +
|
| 175 | + * Compile for emulation (fast compile time, targets emulated FPGA device): |
| 176 | + ``` |
| 177 | + nmake fpga_emu |
| 178 | + ``` |
| 179 | + * Generate the optimization report: |
| 180 | + ``` |
| 181 | + nmake report |
| 182 | + ``` |
| 183 | + * Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): |
| 184 | + ``` |
| 185 | + nmake fpga_sim |
| 186 | + ``` |
| 187 | + * Compile for FPGA hardware (longer compile time, targets FPGA device): |
| 188 | + ``` |
| 189 | + nmake fpga |
| 190 | + ``` |
| 191 | +
|
| 192 | +> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. |
| 193 | +
|
| 194 | +## Running the Sample |
| 195 | +
|
| 196 | + 1. Run the sample on the FPGA emulator (the kernel executes on the CPU): |
| 197 | + ``` |
| 198 | + ./device_global.fpga_emu (Linux) |
| 199 | + device_global.fpga_emu.exe (Windows) |
| 200 | + ``` |
| 201 | +2. Run the sample on the FPGA simulator device: |
| 202 | + * On Linux |
| 203 | + ```bash |
| 204 | + CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./device_global.fpga_sim |
| 205 | + ``` |
| 206 | + * On Windows |
| 207 | + ```bash |
| 208 | + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 |
| 209 | + device_global.fpga_sim.exe |
| 210 | + set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= |
| 211 | + ``` |
| 212 | +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=<board-support-package>:<board-variant>`): |
| 213 | + ``` |
| 214 | + ./device_global.fpga (Linux) |
| 215 | + device_global.fpga.exe (Windows) |
| 216 | + ``` |
| 217 | +
|
| 218 | +
|
| 219 | +### Example of Output |
| 220 | +``` |
| 221 | +PASSED: The results are correct |
| 222 | +``` |
| 223 | +
|
| 224 | +## License |
| 225 | +Code samples are licensed under the MIT license. See |
| 226 | +[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
| 227 | +
|
| 228 | +Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). |
0 commit comments