|
144 | 144 | " - copy link to latest `linux_release.tgz` from assets\n",
|
145 | 145 | " - on the CUDA development machine: `mkdir syclomatic; cd syclomatic`\n",
|
146 | 146 | " - `wget <link to linux_release.tgz>`\n",
|
147 |
| - " - `tar -cvf linux_release.tgz`\n", |
148 |
| - " - `export PATH=\"/home/$USER/syclomatic/bin:$PATH\"`\n", |
149 |
| - "- Copy the above `vectoradd.cu` CUDA source to this machine\n", |
150 |
| - "- Compile and run the `vectoradd.cu`\n", |
151 |
| - " - nvcc vectoradd.cu\n", |
| 147 | + " - `tar -xvf linux_release.tgz`\n", |
| 148 | + " - `export PATH=\"/home/$USER/syclomatic/bin:$PATH\"`\n", |
| 149 | + " - Verify installation: `c2s --version`\n", |
| 150 | + "- Create a working directory and copy the above `vectoradd.cu` CUDA source to this machine\n", |
152 | 151 | "\n",
|
153 | 152 | "\n",
|
154 | 153 | "### Migrate CUDA source to SYCL source using SYCLomatic\n",
|
|
168 | 167 | "Next we will use the `c2s --out-root` option to specify a custom output directory like shown below:\n",
|
169 | 168 | "\n",
|
170 | 169 | "```\n",
|
171 |
| - "c2s --out-root sycl_code vectoradd.cu\n", |
| 170 | + "c2s vectoradd.cu --use-custom-helper=api --out-root sycl_code\n", |
172 | 171 | "```\n",
|
173 | 172 | "\n",
|
174 | 173 | "This command should migrate the CUDA source to SYCL source in a folder named `sycl_code`\n",
|
|
249 | 248 | " sycl::free(d_C, q_ct1);\n",
|
250 | 249 | " return 0;\n",
|
251 | 250 | "}\n",
|
252 |
| - "```" |
| 251 | + "```\n", |
| 252 | + "\n", |
| 253 | + "The migrated SYCL code can be compiled using the following command in terminal:\n", |
| 254 | + "```sh\n", |
| 255 | + "icpx -fsycl -I include vectoradd.dp.cpp\n", |
| 256 | + "```\n", |
| 257 | + "\n", |
| 258 | + "OR you can compile and run by executing the cell below:" |
253 | 259 | ]
|
254 | 260 | },
|
255 | 261 | {
|
|
323 | 329 | "source": [
|
324 | 330 | "## Manually Optimize the migrated SYCL source\n",
|
325 | 331 | "\n",
|
326 |
| - "The SYCLomaticy Tool will migrate the CUDA code to the SYCL code to get functionality, but you may have to manually optimize the resulting SYCL code for optimal performance.\n", |
| 332 | + "The SYCLomatic Tool will migrate the CUDA code to the SYCL code to get functionality, but you may have to manually optimize the resulting SYCL code for optimal performance.\n", |
327 | 333 | "\n",
|
328 | 334 | "Now that we have successfully migrated the CUDA code to the SYCL code and executed on an Intel CPU/GPU, let’s look at what manual optimizations we can do.\n",
|
329 | 335 | "\n",
|
|
339 | 345 | "The above code is also creating a SYCL queue with an `in_order` queue property and is doing a default device selection, which is the same as the code below using just SYCL api syntax:\n",
|
340 | 346 | "\n",
|
341 | 347 | "```cpp\n",
|
342 |
| - "sycl::queue q_ct1{sycl::default_selector(), sycl::property::queue::in_order()};\n", |
| 348 | + "sycl::queue q_ct1{sycl::default_selector_v(), sycl::property::queue::in_order()};\n", |
343 | 349 | "```\n",
|
344 | 350 | "\n",
|
345 | 351 | "Using an `in_order` queue property will not allow kernels with no dependency to overlap execution. Therefore, we will remove the `in_order` queue property and add event-based dependency between kernels.\n",
|
|
0 commit comments