Skip to content

Commit df1acd2

Browse files
authored
FPGA: Add device_global sample (#1291)
1 parent dca70c5 commit df1acd2

File tree

6 files changed

+556
-0
lines changed

6 files changed

+556
-0
lines changed

DirectProgramming/C++SYCL_FPGA/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,8 @@ flowchart LR
7878
|:--- |:--- |:---
7979
| [ac_fixed](Tutorials/Features/ac_fixed) | [Tutorials/Features](Tutorials/Features) | How different methods of `ac_fixed` number construction affect hardware resource utilization <br> Recommended method for constructing `ac_fixed` numbers in your kernel <br> Accessing and using the `ac_fixed` math library functions <br> Trading off accuracy of results for reduced resource usage on the FPGA
8080
| [ac_int](Tutorials/Features/ac_int) | [Tutorials/Features](Tutorials/Features) | Using the `ac_int` data type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an `ac_int` number
81+
| [device_global (experimental)](Tutorials/Features/experimental/device_global)| [Tutorials/Features](Tutorials/Features) | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values
82+
8183
| [double_buffering](Tutorials/DesignPatterns/double_buffering) | [Tutorials/DesignPatterns](Tutorials/DesignPatterns) | How and when to implement the double buffering optimization technique
8284
| [explicit_data_movement](Tutorials/DesignPatterns/explicit_data_movement) | [Tutorials/DesignPatterns](Tutorials/DesignPatterns) | How to explicitly manage the movement of data for the FPGA
8385
| [hostpipes (experimental)](Tutorials/Features/experimental/hostpipes) | [Tutorials/Features](Tutorials/Features) | How to use host pipes to send and receive data between a host and the FPGA
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
if(UNIX)
2+
# Direct CMake to use icpx rather than the default C++ compiler/linker
3+
set(CMAKE_CXX_COMPILER icpx)
4+
else() # Windows
5+
# Force CMake to use icx-cl rather than the default C++ compiler/linker
6+
# (needed on Windows only)
7+
include (CMakeForceCompiler)
8+
CMAKE_FORCE_CXX_COMPILER (icx-cl IntelDPCPP)
9+
include (Platform/Windows-Clang)
10+
endif()
11+
12+
cmake_minimum_required (VERSION 3.4)
13+
14+
project(device_global CXX)
15+
16+
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
17+
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
18+
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
19+
20+
add_subdirectory (src)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# `device_global` Sample
2+
This FPGA tutorial explains how to use `device_global` class as a way of keeping a state between multiple invocations of a kernel.
3+
| Area | Description
4+
|:--- |:---
5+
| What you will learn | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values<br>
6+
| Time to complete | 15 minutes
7+
| Category | Features
8+
9+
## Prerequisites
10+
11+
This sample is part of the FPGA code samples.
12+
It is categorized as a Tier 2 sample that demonstrates a compiler feature.
13+
14+
```mermaid
15+
flowchart LR
16+
tier1("Tier 1: Get Started")
17+
tier2("Tier 2: Explore the Fundamentals")
18+
tier3("Tier 3: Explore the Advanced Techniques")
19+
tier4("Tier 4: Explore the Reference Designs")
20+
21+
tier1 --> tier2 --> tier3 --> tier4
22+
23+
style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
24+
style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
25+
style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
26+
style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
27+
```
28+
29+
Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/DPC++FPGA/README.md).
30+
You can also find more information about [troubleshooting build errors](/DirectProgramming/DPC++FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/DPC++FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/DPC++FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/DPC++FPGA/README.md#documentation), etc.
31+
32+
| Optimized for | Description
33+
|:--- |:---
34+
| OS | Ubuntu* 18.04/20.04 <br> RHEL*/CentOS* 8 <br> SUSE* 15 <br> Windows* 10
35+
| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
36+
| Software | Intel® oneAPI DPC++/C++ Compiler
37+
38+
> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
39+
>
40+
> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
41+
> - Questa*-Intel® FPGA Edition
42+
> - Questa*-Intel® FPGA Starter Edition
43+
> - ModelSim® SE
44+
>
45+
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
46+
>
47+
> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.
48+
49+
## Purpose
50+
This tutorial demonstrates a simple example of initializing a `device_global` class to a non-zero value, and using it to keep state between multiple re-launches of a kernel.
51+
52+
### Description of `device_global`
53+
The `device_global` class is an extension that introduces device scoped memory allocations into SYCL that can be accessed within a kernel using syntax similar to C++ global variables, but that have unique instances per `sycl::device`. Similar to C++ global variables, a `device_global` variable have a namespace scope and is visible to all kernels within that scope.
54+
55+
A `device_global` class is instantiated from a class template. The template is parameterized by the type of the underlying allocation, and a list of properties. The type of the allocation also encodes the size of the allocation for potentially multidimensional array types. The list of properties change the functional behavior of the `device_global` instance to enable compiler and runtime optimizations. In the code sample two properties are used, `device_image_scope` and `host_access_none`. The `device_image_scope` property limits the scope of a single instance of a `device_global` from a device to a `device_image`. The absence of the `device_image_scope` property is currently not supported by the compiler. The `host_access` property is an assertion by the user telling the implementation whether the host code copies to or from the `device_global`. The property comes in four variants `host_access_none`, `host_access_read`, `host_access_write`, and `host_access_read_write`(the default), but only the `host_access_none` is supported currently. Further details on these and other properties can be found in the [Properties for `device_global` variables](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables) section of the [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc)
56+
57+
A `device_global` instance can be used to store state across multiple relaunches of a kernel without having to pass in a `buffer` as a kernel argument. An example of an application that would benefit from such a state is where kernels are nodes in a state-machine.
58+
59+
### How to initialize a `device_global` instance
60+
A `device_global` instance is always zero-initialized. If you need the first usage of a `device_global` instance in device code to be initialized to a non-zero value, use the following technique:
61+
62+
> Instantiate a second `device_global<bool>` that represents a flag that controls when to initialize the `device_global` to a non-zero value. This flag is zero-initialized to `false`. Once initialization happens, the flag is set to `true` and initialization code doesn't execute on subsequent relaunches of the kernel.
63+
```cpp
64+
namespace exp = sycl::ext::oneapi::experimental;
65+
using FPGAProperties = decltype(exp::properties(
66+
exp::device_image_scope, exp::host_access_none));
67+
exp::device_global<int, FPGAProperties> val;
68+
exp::device_global<bool, FPGAProperties> is_val_initialized;
69+
int main () {
70+
sycl::queue q;
71+
q.submit([&](sycl::handler& h) {
72+
h.single_task([=] {
73+
// Initialization happens only once
74+
if (!is_val_initialized) {
75+
val.get() = 42;
76+
is_val_initialized.get() = true;
77+
}
78+
// uses of `val`
79+
});
80+
});
81+
}
82+
```
83+
84+
### Additional Documentation
85+
- [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) helps understand the API and of `device_globals` and restrictions on it's usage
86+
87+
## Key Concepts
88+
* The basic usage of the `device_global` class
89+
* How initialize a `device_global` to non-zero value
90+
91+
## Building the `device_global` Tutorial
92+
> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables.
93+
> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window.
94+
> This practice ensures that your compiler, libraries, and tools are ready for development.
95+
>
96+
> Linux*:
97+
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
98+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
99+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
100+
>
101+
> Windows*:
102+
> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat`
103+
> - Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'`
104+
>
105+
> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
106+
107+
### On a Linux* System
108+
109+
1. Generate the `Makefile` by running `cmake`.
110+
```
111+
mkdir build
112+
cd build
113+
```
114+
To compile for the default target (the Agilex™ device family), run `cmake` using the command:
115+
```
116+
cmake ..
117+
```
118+
119+
> **Note**: You can change the default target by using the command:
120+
> ```
121+
> cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
122+
> ```
123+
>
124+
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
125+
> ```
126+
> cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
127+
> ```
128+
>
129+
> You will only be able to run an executable on the FPGA if you specified a BSP.
130+
131+
2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:
132+
133+
* Compile for emulation (fast compile time, targets emulated FPGA device):
134+
```
135+
make fpga_emu
136+
```
137+
* Generate the optimization report:
138+
```
139+
make report
140+
```
141+
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
142+
```
143+
make fpga_sim
144+
```
145+
* Compile for FPGA hardware (longer compile time, targets FPGA device):
146+
```
147+
make fpga
148+
```
149+
150+
### On a Windows* System
151+
152+
1. Generate the `Makefile` by running `cmake`.
153+
```
154+
mkdir build
155+
cd build
156+
```
157+
To compile for the default target (the Agilex™ device family), run `cmake` using the command:
158+
```
159+
cmake -G "NMake Makefiles" ..
160+
```
161+
> **Note**: You can change the default target by using the command:
162+
> ```
163+
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
164+
> ```
165+
>
166+
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
167+
> ```
168+
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
169+
> ```
170+
>
171+
> You will only be able to run an executable on the FPGA if you specified a BSP.
172+
173+
2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:
174+
175+
* Compile for emulation (fast compile time, targets emulated FPGA device):
176+
```
177+
nmake fpga_emu
178+
```
179+
* Generate the optimization report:
180+
```
181+
nmake report
182+
```
183+
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
184+
```
185+
nmake fpga_sim
186+
```
187+
* Compile for FPGA hardware (longer compile time, targets FPGA device):
188+
```
189+
nmake fpga
190+
```
191+
192+
> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory.
193+
194+
## Running the Sample
195+
196+
1. Run the sample on the FPGA emulator (the kernel executes on the CPU):
197+
```
198+
./device_global.fpga_emu (Linux)
199+
device_global.fpga_emu.exe (Windows)
200+
```
201+
2. Run the sample on the FPGA simulator device:
202+
* On Linux
203+
```bash
204+
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./device_global.fpga_sim
205+
```
206+
* On Windows
207+
```bash
208+
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
209+
device_global.fpga_sim.exe
210+
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
211+
```
212+
3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=<board-support-package>:<board-variant>`):
213+
```
214+
./device_global.fpga (Linux)
215+
device_global.fpga.exe (Windows)
216+
```
217+
218+
219+
### Example of Output
220+
```
221+
PASSED: The results are correct
222+
```
223+
224+
## License
225+
Code samples are licensed under the MIT license. See
226+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
227+
228+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
{
2+
"guid": "92110788-F876-4009-89A5-D1589B588D20",
3+
"name": "device_global",
4+
"categories": ["Toolkit/oneAPI Direct Programming/C++SYCL FPGA/Tutorials/Features/experimental"],
5+
"description": "An Intel® FPGA tutorial demonstrating the device_global feature",
6+
"toolchain": ["icpx"],
7+
"os": ["linux", "windows"],
8+
"targetDevice": ["FPGA"],
9+
"builder": ["ide", "cmake"],
10+
"languages": [{"cpp":{}}],
11+
"commonFolder": {
12+
"base": "../../../..",
13+
"include": [
14+
"README.md",
15+
"Tutorials/Features/experimental/device_global",
16+
"include"
17+
],
18+
"exclude": []
19+
},
20+
"ciTests": {
21+
"linux": [
22+
{
23+
"id": "fpga_emu",
24+
"steps": [
25+
"icpx --version",
26+
"mkdir build",
27+
"cd build",
28+
"cmake ..",
29+
"make fpga_emu",
30+
"./device_global.fpga_emu"
31+
]
32+
},
33+
{
34+
"id": "report",
35+
"steps": [
36+
"icpx --version",
37+
"mkdir build",
38+
"cd build",
39+
"cmake ..",
40+
"make report"
41+
]
42+
}
43+
],
44+
"windows": [
45+
{
46+
"id": "fpga_emu",
47+
"steps": [
48+
"icpx --version",
49+
"cd ../../../..",
50+
"mkdir build",
51+
"cd build",
52+
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global",
53+
"nmake fpga_emu",
54+
"device_global.fpga_emu.exe"
55+
]
56+
},
57+
{
58+
"id": "report",
59+
"steps": [
60+
"icpx --version",
61+
"cd ../../../..",
62+
"mkdir build",
63+
"cd build",
64+
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global",
65+
"nmake report"
66+
]
67+
}
68+
]
69+
},
70+
"expertise": "Concepts and Functionality"
71+
}

0 commit comments

Comments
 (0)