-
Notifications
You must be signed in to change notification settings - Fork 727
FPGA: Add device_global
sample
#1291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
fd5c559
initial commit
artemrad 979ac5a
Fixed selectors
artemrad d69da05
Subtle cmake flag change
artemrad a4008c1
Renamed variable
artemrad 5a4de5c
Reuse the queue
artemrad 264a8bc
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad ce12b8f
New Readme format
artemrad 6eb28e7
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad 04d94a1
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad 26d8a1b
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad f6d1358
Apply suggestions from code review
artemrad 7fb64c8
Removed sections
artemrad a26a758
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad 543e82b
Removed Include files section
artemrad 63f5a35
Apply suggestions from code review
artemrad 0287c58
Added support for IPA flow
artemrad 34e950e
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad 539855e
Apply suggestions from code review
artemrad 2ff30d2
Apply changes
artemrad 76e9947
Apply suggestions from code review
artemrad 60ced83
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad 9f2675b
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad 3f159a0
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad 2e15f84
fix
artemrad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
20 changes: 20 additions & 0 deletions
20
DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental/device_global/CMakeLists.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
if(UNIX) | ||
# Direct CMake to use icpx rather than the default C++ compiler/linker | ||
set(CMAKE_CXX_COMPILER icpx) | ||
else() # Windows | ||
# Force CMake to use icx-cl rather than the default C++ compiler/linker | ||
# (needed on Windows only) | ||
include (CMakeForceCompiler) | ||
CMAKE_FORCE_CXX_COMPILER (icx-cl IntelDPCPP) | ||
include (Platform/Windows-Clang) | ||
endif() | ||
|
||
cmake_minimum_required (VERSION 3.4) | ||
|
||
project(device_global CXX) | ||
|
||
set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}) | ||
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}) | ||
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}) | ||
|
||
add_subdirectory (src) |
219 changes: 219 additions & 0 deletions
219
...rogramming/C++SYCL_FPGA/Tutorials/Features/experimental/device_global/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
# `device_global` | ||
This FPGA tutorial explains how to use `device_global` class as a way of keeping a state between multiple invocations of a kernel. | ||
|
||
| Optimized for | Description | ||
|:--- |:--- | ||
| OS | Linux* Ubuntu* 18.04/20.04 <br> RHEL*/CentOS* 8 <br> SUSE* 15 <br> Windows* 10 | ||
| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA; <br> Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) <br> Intel® FPGA 3rd party / custom platforms with oneAPI support <br> **Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* | ||
| Software | Intel® oneAPI DPC++ Compiler <br> Intel® FPGA Add-On for oneAPI Base Toolkit | ||
| What you will learn | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values<br> | ||
| Time to complete | 15 minutes | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Purpose | ||
This tutorial demonstrates a simple example of initializing a `device_global` class to a non-zero value, and using it to keep state between multiple re-launches of a kernel. | ||
|
||
### Description of `device_global` | ||
The `device_global` class is an extension that introduces device scoped memory allocations into SYCL that can be accessed within a kernel using syntax similar to C++ global variables, but that have unique instances per `sycl::device`. Similar to C++ global variables, a `device_global` variable have a namespace scope and is visible to all kernels within that scope. | ||
|
||
A `device_global` class is instantiated from a class template. The template is parameterized by the type of the underlying allocation, and a list of properties. The type of the allocation also encodes the size of the allocation for potentially multidimensional array types. The list of properties change the functional behavior of the `device_global` instance to enable compiler and runtime optimizations. In the code sample two properties are used, `device_image_scope` and `host_access_none`. The `device_image_scope` property limits the scope of a single instance of a `device_global` from a device to a `device_image`. The absence of the `device_image_scope` property is currently not supported by the compiler. The `host_access` property is an assertion by the user telling the implementation whether the host code copies to or from the `device_global`. The property comes in four variants `host_access_none`, `host_access_read`, `host_access_write`, and `host_access_read_write`(the default), but only the `host_access_none` is supported currently. Further details on these and other properties can be found in the [Properties for `device_global` variables](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables) section of the [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) | ||
|
||
A `device_global` instance can be used to store state across multiple relaunches of a kernel without having to pass in a `buffer` as a kernel argument. An example of an application that would benefit from such a state is where kernels are nodes in a state-machine. | ||
|
||
### How to initialize a `device_global` instance | ||
A `device_global` instance is always zero-initialized. If you need the first usage of a `device_global` instance in device code to be initialized to a non-zero value, use the following technique: | ||
|
||
> Instantiate a second `device_global<bool>` that represents a flag that controls when to initialize the `device_global` to a non-zero value. This flag is zero-initialized to `false`. Once initialization happens, the flag is set to `true` and initialization code doesn't execute on subsequent relaunches of the kernel. | ||
```cpp | ||
namespace exp = sycl::ext::oneapi::experimental; | ||
using FPGAProperties = decltype(exp::properties( | ||
exp::device_image_scope)); | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
exp::device_global<int, FPGAProperties> val; | ||
exp::device_global<bool, FPGAProperties> is_val_initialized; | ||
int main () { | ||
sycl::queue q; | ||
q.submit([&](sycl::handler& h) { | ||
h.single_task([=] { | ||
// Initialization happens only once | ||
if (!is_val_initialized) { | ||
val.get() = 42; | ||
is_val_initialized.get() = true; | ||
} | ||
// uses of `val` | ||
}); | ||
}); | ||
} | ||
``` | ||
|
||
### Additional Documentation | ||
- [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) helps understand the API and of `device_globals` and restrictions on it's usage | ||
- [Explore SYCL* Through Intel® FPGA Code Samples](https://software.intel.com/content/www/us/en/develop/articles/explore-dpcpp-through-intel-fpga-code-samples.html) helps you to navigate the samples and build your knowledge of FPGAs and SYCL. | ||
- [FPGA Optimization Guide for Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide) helps you understand how to target FPGAs using SYCL and Intel® oneAPI Toolkits. | ||
- [Intel® oneAPI Programming Guide](https://software.intel.com/en-us/oneapi-programming-guide) helps you understand target-independent, SYCL-compliant programming using Intel® oneAPI Toolkits. | ||
yuguen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Key Concepts | ||
* The basic usage of the `device_global` class | ||
* How initialize a `device_global` to non-zero value | ||
|
||
|
||
## Building the `device_global` Tutorial | ||
> **Note**: If you have not already done so, set up your CLI | ||
> environment by sourcing the `setvars` script located in | ||
> the root of your oneAPI installation. | ||
> | ||
> Linux*: | ||
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` | ||
> - For private installations: `. ~/intel/oneapi/setvars.sh` | ||
> | ||
> Windows*: | ||
> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat` | ||
> | ||
>For more information on environment variables, see **Use the setvars Script** for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Include Files | ||
The included header `dpc_common.hpp` is located at `%ONEAPI_ROOT%\dev-utilities\latest\include` on your development system. | ||
|
||
### Running Samples in Intel® DevCloud | ||
If running a sample in the Intel® DevCloud, remember that you must specify the type of compute node and whether to run in batch or interactive mode. Compiles to FPGA are only supported on fpga_compile nodes. Executing programs on FPGA hardware is only supported on fpga_runtime nodes of the appropriate type, such as fpga_runtime:arria10 or fpga_runtime:stratix10. Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the Intel® oneAPI Base Toolkit Get Started Guide ([https://devcloud.intel.com/oneapi/documentation/base-toolkit/](https://devcloud.intel.com/oneapi/documentation/base-toolkit/)). | ||
|
||
When compiling for FPGA hardware, it is recommended to increase the job timeout to 12h. | ||
|
||
## Using Visual Studio Code* (Optional) | ||
|
||
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, | ||
and browse and download samples. | ||
|
||
The basic steps to build and run a sample using VS Code include: | ||
- Download a sample using the extension **Code Sample Browser for Intel® oneAPI Toolkits**. | ||
- Configure the oneAPI environment with the extension **Environment Configurator for Intel® oneAPI Toolkits**. | ||
- Open a Terminal in VS Code (**Terminal>New Terminal**). | ||
- Run the sample in the VS Code terminal using the instructions below. | ||
- (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the **Generate Launch Configurations** extension. | ||
|
||
To learn more about the extensions, see the | ||
[Using Visual Studio Code with Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html). | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### On a Linux* System | ||
|
||
1. Generate the `Makefile` by running `cmake`. | ||
``` | ||
mkdir build | ||
cd build | ||
``` | ||
To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: | ||
``` | ||
cmake .. | ||
``` | ||
Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: | ||
|
||
``` | ||
cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 | ||
``` | ||
You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: | ||
``` | ||
cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant> | ||
``` | ||
|
||
2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: | ||
|
||
* Compile for emulation (fast compile time, targets emulated FPGA device): | ||
``` | ||
make fpga_emu | ||
``` | ||
* Generate the optimization report: | ||
``` | ||
make report | ||
``` | ||
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): | ||
``` | ||
make fpga_sim | ||
``` | ||
* Compile for FPGA hardware (longer compile time, targets FPGA device): | ||
``` | ||
make fpga | ||
``` | ||
3. (Optional) As the FPGA hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded <a href="https://iotdk.intel.com/fpga-precompiled-binaries/latest/device_global.fpga.tar.gz" download>here</a>. | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### On a Windows* System | ||
|
||
1. Generate the `Makefile` by running `cmake`. | ||
``` | ||
mkdir build | ||
cd build | ||
``` | ||
To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: | ||
``` | ||
cmake -G "NMake Makefiles" .. | ||
``` | ||
Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: | ||
|
||
``` | ||
cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 | ||
``` | ||
You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: | ||
``` | ||
cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant> | ||
``` | ||
|
||
2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: | ||
|
||
* Compile for emulation (fast compile time, targets emulated FPGA device): | ||
``` | ||
nmake fpga_emu | ||
``` | ||
* Generate the optimization report: | ||
``` | ||
nmake report | ||
``` | ||
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size): | ||
``` | ||
nmake fpga_sim | ||
``` | ||
* Compile for FPGA hardware (longer compile time, targets FPGA device): | ||
``` | ||
nmake fpga | ||
``` | ||
|
||
> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. | ||
> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. | ||
### Troubleshooting | ||
|
||
If an error occurs, you can get more details by running `make` with | ||
the `VERBOSE=1` argument: | ||
``make VERBOSE=1`` | ||
For more comprehensive troubleshooting, use the Diagnostics Utility for | ||
Intel® oneAPI Toolkits, which provides system checks to find missing | ||
dependencies and permissions errors. | ||
[Learn more](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html). | ||
|
||
### In Third-Party Integrated Development Environments (IDEs) | ||
|
||
You can compile and run this tutorial in the Eclipse* IDE (in Linux*) and the Visual Studio* IDE (in Windows*). For instructions, refer to the following link: [FPGA Workflows on Third-Party IDEs for Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-oneapi-dpcpp-fpga-workflow-on-ide.html). | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Running the Sample | ||
|
||
1. Run the sample on the FPGA emulator (the kernel executes on the CPU): | ||
``` | ||
./device_global.fpga_emu (Linux) | ||
device_global.fpga_emu.exe (Windows) | ||
``` | ||
2. Run the sample on the FPGA simulator device: | ||
``` | ||
./device_global.fpga_sim (Linux) | ||
device_global.fpga_sim.exe (Windows) | ||
``` | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
3. Run the sample on the FPGA device: | ||
``` | ||
./device_global.fpga (Linux) | ||
device_global.fpga.exe (Windows) | ||
``` | ||
|
||
|
||
### Example of Output | ||
``` | ||
PASSED: The results are correct | ||
``` | ||
|
||
## License | ||
Code samples are licensed under the MIT license. See | ||
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. | ||
|
||
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt). |
71 changes: 71 additions & 0 deletions
71
DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental/device_global/sample.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
{ | ||
"guid": "92110788-F876-4009-89A5-D1589B588D20", | ||
"name": "device_global", | ||
"categories": ["Toolkit/oneAPI Direct Programming/C++SYCL FPGA/Tutorials/Features/experimental"], | ||
"description": "An Intel® FPGA tutorial demonstrating the device_global feature", | ||
"toolchain": ["icpx"], | ||
"os": ["linux", "windows"], | ||
"targetDevice": ["FPGA"], | ||
"builder": ["ide", "cmake"], | ||
"languages": [{"cpp":{}}], | ||
"commonFolder": { | ||
"base": "../../../..", | ||
"include": [ | ||
"README.md", | ||
"Tutorials/Features/experimental/device_global", | ||
"include" | ||
], | ||
"exclude": [] | ||
}, | ||
"ciTests": { | ||
"linux": [ | ||
{ | ||
"id": "fpga_emu", | ||
"steps": [ | ||
"icpx --version", | ||
"mkdir build", | ||
"cd build", | ||
"cmake ..", | ||
"make fpga_emu", | ||
"./device_global.fpga_emu" | ||
] | ||
}, | ||
{ | ||
"id": "report", | ||
"steps": [ | ||
"icpx --version", | ||
"mkdir build", | ||
"cd build", | ||
"cmake ..", | ||
"make report" | ||
] | ||
} | ||
], | ||
"windows": [ | ||
{ | ||
"id": "fpga_emu", | ||
"steps": [ | ||
"icpx --version", | ||
"cd ../../..", | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"mkdir build", | ||
"cd build", | ||
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global", | ||
"nmake fpga_emu", | ||
"device_global.fpga_emu.exe" | ||
] | ||
}, | ||
{ | ||
"id": "report", | ||
"steps": [ | ||
"icpx --version", | ||
"cd ../../..", | ||
artemrad marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"mkdir build", | ||
"cd build", | ||
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global", | ||
"nmake report" | ||
] | ||
} | ||
] | ||
}, | ||
"expertise": "Concepts and Functionality" | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.