Skip to content

FPGA: Add device_global sample #1291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jan 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
fd5c559
initial commit
artemrad Jan 4, 2023
979ac5a
Fixed selectors
artemrad Jan 6, 2023
d69da05
Subtle cmake flag change
artemrad Jan 13, 2023
a4008c1
Renamed variable
artemrad Jan 17, 2023
5a4de5c
Reuse the queue
artemrad Jan 18, 2023
264a8bc
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 19, 2023
ce12b8f
New Readme format
artemrad Jan 20, 2023
6eb28e7
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad Jan 20, 2023
04d94a1
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 20, 2023
26d8a1b
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 20, 2023
f6d1358
Apply suggestions from code review
artemrad Jan 20, 2023
7fb64c8
Removed sections
artemrad Jan 20, 2023
a26a758
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad Jan 20, 2023
543e82b
Removed Include files section
artemrad Jan 20, 2023
63f5a35
Apply suggestions from code review
artemrad Jan 23, 2023
0287c58
Added support for IPA flow
artemrad Jan 26, 2023
34e950e
Merge branch 'device_global2' of https://github.com/artemrad/oneAPI-s…
artemrad Jan 26, 2023
539855e
Apply suggestions from code review
artemrad Jan 27, 2023
2ff30d2
Apply changes
artemrad Jan 27, 2023
76e9947
Apply suggestions from code review
artemrad Jan 27, 2023
60ced83
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 27, 2023
9f2675b
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 27, 2023
3f159a0
Update DirectProgramming/C++SYCL_FPGA/Tutorials/Features/experimental…
artemrad Jan 27, 2023
2e15f84
fix
artemrad Jan 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DirectProgramming/C++SYCL_FPGA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ flowchart LR
|:--- |:--- |:---
| [ac_fixed](Tutorials/Features/ac_fixed) | [Tutorials/Features](Tutorials/Features) | How different methods of `ac_fixed` number construction affect hardware resource utilization <br> Recommended method for constructing `ac_fixed` numbers in your kernel <br> Accessing and using the `ac_fixed` math library functions <br> Trading off accuracy of results for reduced resource usage on the FPGA
| [ac_int](Tutorials/Features/ac_int) | [Tutorials/Features](Tutorials/Features) | Using the `ac_int` data type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an `ac_int` number
| [device_global (experimental)](Tutorials/Features/experimental/device_global)| [Tutorials/Features](Tutorials/Features) | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values

| [double_buffering](Tutorials/DesignPatterns/double_buffering) | [Tutorials/DesignPatterns](Tutorials/DesignPatterns) | How and when to implement the double buffering optimization technique
| [explicit_data_movement](Tutorials/DesignPatterns/explicit_data_movement) | [Tutorials/DesignPatterns](Tutorials/DesignPatterns) | How to explicitly manage the movement of data for the FPGA
| [hostpipes (experimental)](Tutorials/Features/experimental/hostpipes) | [Tutorials/Features](Tutorials/Features) | How to use host pipes to send and receive data between a host and the FPGA
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
if(UNIX)
# Direct CMake to use icpx rather than the default C++ compiler/linker
set(CMAKE_CXX_COMPILER icpx)
else() # Windows
# Force CMake to use icx-cl rather than the default C++ compiler/linker
# (needed on Windows only)
include (CMakeForceCompiler)
CMAKE_FORCE_CXX_COMPILER (icx-cl IntelDPCPP)
include (Platform/Windows-Clang)
endif()

cmake_minimum_required (VERSION 3.4)

project(device_global CXX)

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

add_subdirectory (src)
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# `device_global` Sample
This FPGA tutorial explains how to use `device_global` class as a way of keeping a state between multiple invocations of a kernel.
| Area | Description
|:--- |:---
| What you will learn | The basic usage of the `device_global` class <br> How to initialize a `device_global` to non-zero values<br>
| Time to complete | 15 minutes
| Category | Features

## Prerequisites

This sample is part of the FPGA code samples.
It is categorized as a Tier 2 sample that demonstrates a compiler feature.

```mermaid
flowchart LR
tier1("Tier 1: Get Started")
tier2("Tier 2: Explore the Fundamentals")
tier3("Tier 3: Explore the Advanced Techniques")
tier4("Tier 4: Explore the Reference Designs")

tier1 --> tier2 --> tier3 --> tier4

style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
```

Find more information about how to navigate this part of the code samples in the [FPGA top-level README.md](/DirectProgramming/DPC++FPGA/README.md).
You can also find more information about [troubleshooting build errors](/DirectProgramming/DPC++FPGA/README.md#troubleshooting), [running the sample on the Intel® DevCloud](/DirectProgramming/DPC++FPGA/README.md#build-and-run-the-samples-on-intel-devcloud-optional), [using Visual Studio Code with the code samples](/DirectProgramming/DPC++FPGA/README.md#use-visual-studio-code-vs-code-optional), [links to selected documentation](/DirectProgramming/DPC++FPGA/README.md#documentation), etc.

| Optimized for | Description
|:--- |:---
| OS | Ubuntu* 18.04/20.04 <br> RHEL*/CentOS* 8 <br> SUSE* 15 <br> Windows* 10
| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs
| Software | Intel® oneAPI DPC++/C++ Compiler

> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles.
>
> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH:
> - Questa*-Intel® FPGA Edition
> - Questa*-Intel® FPGA Starter Edition
> - ModelSim® SE
>
> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH.
>
> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation.

## Purpose
This tutorial demonstrates a simple example of initializing a `device_global` class to a non-zero value, and using it to keep state between multiple re-launches of a kernel.

### Description of `device_global`
The `device_global` class is an extension that introduces device scoped memory allocations into SYCL that can be accessed within a kernel using syntax similar to C++ global variables, but that have unique instances per `sycl::device`. Similar to C++ global variables, a `device_global` variable have a namespace scope and is visible to all kernels within that scope.

A `device_global` class is instantiated from a class template. The template is parameterized by the type of the underlying allocation, and a list of properties. The type of the allocation also encodes the size of the allocation for potentially multidimensional array types. The list of properties change the functional behavior of the `device_global` instance to enable compiler and runtime optimizations. In the code sample two properties are used, `device_image_scope` and `host_access_none`. The `device_image_scope` property limits the scope of a single instance of a `device_global` from a device to a `device_image`. The absence of the `device_image_scope` property is currently not supported by the compiler. The `host_access` property is an assertion by the user telling the implementation whether the host code copies to or from the `device_global`. The property comes in four variants `host_access_none`, `host_access_read`, `host_access_write`, and `host_access_read_write`(the default), but only the `host_access_none` is supported currently. Further details on these and other properties can be found in the [Properties for `device_global` variables](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc#properties-for-device-global-variables) section of the [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc)

A `device_global` instance can be used to store state across multiple relaunches of a kernel without having to pass in a `buffer` as a kernel argument. An example of an application that would benefit from such a state is where kernels are nodes in a state-machine.

### How to initialize a `device_global` instance
A `device_global` instance is always zero-initialized. If you need the first usage of a `device_global` instance in device code to be initialized to a non-zero value, use the following technique:

> Instantiate a second `device_global<bool>` that represents a flag that controls when to initialize the `device_global` to a non-zero value. This flag is zero-initialized to `false`. Once initialization happens, the flag is set to `true` and initialization code doesn't execute on subsequent relaunches of the kernel.
```cpp
namespace exp = sycl::ext::oneapi::experimental;
using FPGAProperties = decltype(exp::properties(
exp::device_image_scope, exp::host_access_none));
exp::device_global<int, FPGAProperties> val;
exp::device_global<bool, FPGAProperties> is_val_initialized;
int main () {
sycl::queue q;
q.submit([&](sycl::handler& h) {
h.single_task([=] {
// Initialization happens only once
if (!is_val_initialized) {
val.get() = 42;
is_val_initialized.get() = true;
}
// uses of `val`
});
});
}
```

### Additional Documentation
- [SYCL `device_global` Language Specification](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_global.asciidoc) helps understand the API and of `device_globals` and restrictions on it's usage

## Key Concepts
* The basic usage of the `device_global` class
* How initialize a `device_global` to non-zero value

## Building the `device_global` Tutorial
> **Note**: When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables.
> Set up your CLI environment by sourcing the `setvars` script located in the root of your oneAPI installation every time you open a new terminal window.
> This practice ensures that your compiler, libraries, and tools are ready for development.
>
> Linux*:
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
>
> Windows*:
> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat`
> - Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'`
>
> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).

### On a Linux* System

1. Generate the `Makefile` by running `cmake`.
```
mkdir build
cd build
```
To compile for the default target (the Agilex™ device family), run `cmake` using the command:
```
cmake ..
```

> **Note**: You can change the default target by using the command:
> ```
> cmake .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
> ```
>
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
> ```
> cmake .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
> ```
>
> You will only be able to run an executable on the FPGA if you specified a BSP.

2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:

* Compile for emulation (fast compile time, targets emulated FPGA device):
```
make fpga_emu
```
* Generate the optimization report:
```
make report
```
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
```
make fpga_sim
```
* Compile for FPGA hardware (longer compile time, targets FPGA device):
```
make fpga
```

### On a Windows* System

1. Generate the `Makefile` by running `cmake`.
```
mkdir build
cd build
```
To compile for the default target (the Agilex™ device family), run `cmake` using the command:
```
cmake -G "NMake Makefiles" ..
```
> **Note**: You can change the default target by using the command:
> ```
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<FPGA device family or FPGA part number>
> ```
>
> Alternatively, you can target an explicit FPGA board variant and BSP by using the following command:
> ```
> cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=<board-support-package>:<board-variant>
> ```
>
> You will only be able to run an executable on the FPGA if you specified a BSP.

2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow:

* Compile for emulation (fast compile time, targets emulated FPGA device):
```
nmake fpga_emu
```
* Generate the optimization report:
```
nmake report
```
* Compile for simulation (fast compile time, targets simulated FPGA device, reduced data size):
```
nmake fpga_sim
```
* Compile for FPGA hardware (longer compile time, targets FPGA device):
```
nmake fpga
```

> **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory.

## Running the Sample

1. Run the sample on the FPGA emulator (the kernel executes on the CPU):
```
./device_global.fpga_emu (Linux)
device_global.fpga_emu.exe (Windows)
```
2. Run the sample on the FPGA simulator device:
* On Linux
```bash
CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./device_global.fpga_sim
```
* On Windows
```bash
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1
device_global.fpga_sim.exe
set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=
```
3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=<board-support-package>:<board-variant>`):
```
./device_global.fpga (Linux)
device_global.fpga.exe (Windows)
```


### Example of Output
```
PASSED: The results are correct
```

## License
Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.

Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
{
"guid": "92110788-F876-4009-89A5-D1589B588D20",
"name": "device_global",
"categories": ["Toolkit/oneAPI Direct Programming/C++SYCL FPGA/Tutorials/Features/experimental"],
"description": "An Intel® FPGA tutorial demonstrating the device_global feature",
"toolchain": ["icpx"],
"os": ["linux", "windows"],
"targetDevice": ["FPGA"],
"builder": ["ide", "cmake"],
"languages": [{"cpp":{}}],
"commonFolder": {
"base": "../../../..",
"include": [
"README.md",
"Tutorials/Features/experimental/device_global",
"include"
],
"exclude": []
},
"ciTests": {
"linux": [
{
"id": "fpga_emu",
"steps": [
"icpx --version",
"mkdir build",
"cd build",
"cmake ..",
"make fpga_emu",
"./device_global.fpga_emu"
]
},
{
"id": "report",
"steps": [
"icpx --version",
"mkdir build",
"cd build",
"cmake ..",
"make report"
]
}
],
"windows": [
{
"id": "fpga_emu",
"steps": [
"icpx --version",
"cd ../../../..",
"mkdir build",
"cd build",
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global",
"nmake fpga_emu",
"device_global.fpga_emu.exe"
]
},
{
"id": "report",
"steps": [
"icpx --version",
"cd ../../../..",
"mkdir build",
"cd build",
"cmake -G \"NMake Makefiles\" ../Tutorials/Features/experimental/device_global",
"nmake report"
]
}
]
},
"expertise": "Concepts and Functionality"
}
Loading