Skip to content

Commit 207c4b8

Browse files
authored
Merge branch 'oneapi-src:master' into master
2 parents 2c30364 + 78a5d9d commit 207c4b8

File tree

631 files changed

+19087
-2626
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

631 files changed

+19087
-2626
lines changed

AI-and-Analytics/Features-and-Functionality/IntelPyTorch_Extensions_AutoMixedPrecision/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ Third party program Licenses can be found here: [third-party-programs.txt](https
3434

3535
### On a Linux\* System
3636

37-
Please follow instructions [here](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/installation.html).
37+
Please follow instructions [here](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html).
3838

3939
## Running the Sample
4040

41-
Please follow instructions [here](https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/examples.html#complete-bfloat16).
41+
Please follow instructions [here](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/examples.html#complete-bfloat16).

AI-and-Analytics/Getting-Started-Samples/README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,12 @@ Third party program Licenses can be found here: [third-party-programs.txt](https
1919
| daal4py | [IntelPython_daal4py_GettingStarted](IntelPython_daal4py_GettingStarted) | Batch linear regression using the python API package daal4py from oneAPI Data Analytics Library (oneDAL) .
2020
| Intel® Neural Compressor | [INC-Sample-for-Tensorflow](INC-Sample-for-Tensorflow) |Quantize a fp32 model into int8 by Intel® Neural Compressor, and compare the performance between fp32 and int8 .
2121
| Modin | [IntelModin_GettingStarted](IntelModin_GettingStarted) | Run Modin-accelerated Pandas functions and note the performance gain .
22-
| PyTorch | [IntelPyTorch_GettingStarted](IntelPyTorch_GettingStarted) | A simple training example for PyTorch.
22+
| PyTorch | [IntelPyTorch_GettingStarted](Intel_Extension_For_PyTorch_GettingStarted) | A simple training example for PyTorch.
2323
| TensorFlow | [IntelTensorFlow_GettingStarted](IntelTensorFlow_GettingStarted) | A simple training example for TensorFlow.
2424
| XGBoost | [IntelPython_XGBoost_GettingStarted](IntelPython_XGBoost_GettingStarted) | Set up and train an XGBoost* model on datasets for prediction.
25+
| Modin |[IntelModin_Vs_Pandas](IntelModin_Vs_Pandas)| compares the performance of Intel® Distribution of Modin* and the performance of Pandas
26+
| Scikit-learn (OneDAL) | [Intel_Extension_For_SKLearn_GettingStarted](Intel_Extension_For_SKLearn_GettingStarted) |speed up Scikit-learn application use oneDAL
27+
|oneAPI docker image | [IntelAIKitContainer_GettingStarted](IntelAIKitContainer_GettingStarted) | configuration script to automatically configure the environment |
2528

2629

2730
# Using Samples in Intel® DevCloud for oneAPI
Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## Title
2+
Reductions using numba-dpex: This is part 8 of the AI Numba-dpex essentials training series
3+
4+
## Requirements
5+
| Optimized for | Description
6+
|:--- |:---
7+
| OS | Linux* Ubuntu 18.04, 20 Windows* 10
8+
| Hardware | Skylake with GEN9 or newer
9+
| Software | Intel® oneAPI DPC++ Compiler, Jupyter Notebooks, Intel Devcloud
10+
11+
## Purpose
12+
These hands-on exercises show how to perform reductions using Numba-dpex.
13+
14+
## License
15+
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
16+
17+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
18+
19+
## Install Directions
20+
21+
The Jupyter notebooks are tested and can be run on Intel Devcloud.
22+
Below are the steps to access these Jupyter notebooks on Intel Devcloud
23+
1. Register on [Intel Devcloud](https://intelsoftwaresites.secure.force.com/Devcloud/oneapi)
24+
2. Go to the "Terminal" in the Intel Devcloud
25+
3. Navigate to "oneAPI-samples/AI-and-Analytics/Jupyter/Numba_dpex_Essentials_training" folder and open the Welcome.ipynb, click on "Module 8 - dpex_reductions" notebook and follow the instructions
26+

AI-and-Analytics/Jupyter/Numba_dpex_Essentials_training/08_dpex_reductions/dpex_reductions.ipynb

+874
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
##==============================================================
2+
## Copyright © Intel Corporation
3+
##
4+
## SPDX-License-Identifier: Apache-2.0
5+
## =============================================================
6+
7+
import dpnp as np
8+
import numba_dpex as ndpex
9+
import timeit
10+
11+
12+
@ndpex.kernel
13+
def atomic_reduction(a):
14+
idx = ndpex.get_global_id(0)
15+
ndpex.atomic.add(a, 0, a[idx])
16+
17+
18+
def main():
19+
N = 1024
20+
a = np.arange(N)
21+
22+
#print("Using device ...")
23+
#print(a.device)
24+
25+
atomic_reduction[N, ndpex.DEFAULT_LOCAL_SIZE](a)
26+
#print("Reduction sum =", a[0])
27+
28+
#print("Done...")
29+
30+
31+
if __name__ == "__main__":
32+
t = timeit.Timer(lambda: main())
33+
print("Time to calculate reduction using atomics",t.timeit(500),"seconds")
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
##==============================================================
2+
## Copyright © Intel Corporation
3+
##
4+
## SPDX-License-Identifier: Apache-2.0
5+
## =============================================================
6+
7+
import dpctl
8+
import numpy as np
9+
from numba import float32
10+
11+
import numba_dpex as dpex
12+
13+
14+
def no_arg_barrier_support():
15+
"""
16+
This example demonstrates the usage of numba_dpex's ``barrier``
17+
intrinsic function. The ``barrier`` function is usable only inside
18+
a ``kernel`` and is equivalent to OpenCL's ``barrier`` function.
19+
"""
20+
21+
@dpex.kernel
22+
def twice(A):
23+
i = dpex.get_global_id(0)
24+
d = A[i]
25+
# no argument defaults to global mem fence
26+
dpex.barrier()
27+
A[i] = d * 2
28+
29+
N = 10
30+
arr = np.arange(N).astype(np.float32)
31+
print(arr)
32+
33+
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
34+
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
35+
device = dpctl.select_default_device()
36+
print("Using device ...")
37+
device.print_device_info()
38+
39+
with dpctl.device_context(device):
40+
twice[N, dpex.DEFAULT_LOCAL_SIZE](arr)
41+
42+
# the output should be `arr * 2, i.e. [0, 2, 4, 6, ...]`
43+
print(arr)
44+
45+
46+
def local_memory():
47+
"""
48+
This example demonstrates the usage of numba-dpex's `local.array`
49+
intrinsic function. The function is used to create a static array
50+
allocated on the devices local address space.
51+
"""
52+
blocksize = 10
53+
54+
@dpex.kernel
55+
def reverse_array(A):
56+
lm = dpex.local.array(shape=10, dtype=float32)
57+
i = dpex.get_global_id(0)
58+
59+
# preload
60+
lm[i] = A[i]
61+
# barrier local or global will both work as we only have one work group
62+
dpex.barrier(dpex.CLK_LOCAL_MEM_FENCE) # local mem fence
63+
# write
64+
A[i] += lm[blocksize - 1 - i]
65+
66+
arr = np.arange(blocksize).astype(np.float32)
67+
print(arr)
68+
69+
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
70+
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
71+
device = dpctl.select_default_device()
72+
print("Using device ...")
73+
device.print_device_info()
74+
75+
with dpctl.device_context(device):
76+
reverse_array[blocksize, dpex.DEFAULT_LOCAL_SIZE](arr)
77+
78+
# the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]``
79+
print(arr)
80+
81+
82+
def main():
83+
no_arg_barrier_support()
84+
local_memory()
85+
86+
print("Done...")
87+
88+
89+
if __name__ == "__main__":
90+
main()
91+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Copyright 2020, 2021 Intel Corporation
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import dpctl
16+
import numpy as np
17+
from numba import float32
18+
19+
import numba_dpex as dpex
20+
21+
22+
def no_arg_barrier_support():
23+
"""
24+
This example demonstrates the usage of numba_dpex's ``barrier``
25+
intrinsic function. The ``barrier`` function is usable only inside
26+
a ``kernel`` and is equivalent to OpenCL's ``barrier`` function.
27+
"""
28+
29+
@dpex.kernel
30+
def twice(A):
31+
i = dpex.get_global_id(0)
32+
d = A[i]
33+
# no argument defaults to global mem fence
34+
dpex.barrier()
35+
A[i] = d * 2
36+
37+
N = 10
38+
arr = np.arange(N).astype(np.float32)
39+
print(arr)
40+
41+
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
42+
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
43+
device = dpctl.select_default_device()
44+
print("Using device ...")
45+
device.print_device_info()
46+
47+
with dpctl.device_context(device):
48+
twice[N, dpex.DEFAULT_LOCAL_SIZE](arr)
49+
50+
# the output should be `arr * 2, i.e. [0, 2, 4, 6, ...]`
51+
print(arr)
52+
53+
54+
def local_memory():
55+
"""
56+
This example demonstrates the usage of numba-dpex's `local.array`
57+
intrinsic function. The function is used to create a static array
58+
allocated on the devices local address space.
59+
"""
60+
blocksize = 10
61+
62+
@dpex.kernel
63+
def reverse_array(A):
64+
lm = dpex.local.array(shape=10, dtype=float32)
65+
i = dpex.get_global_id(0)
66+
67+
# preload
68+
lm[i] = A[i]
69+
# barrier local or global will both work as we only have one work group
70+
dpex.barrier(dpex.CLK_LOCAL_MEM_FENCE) # local mem fence
71+
# write
72+
A[i] += lm[blocksize - 1 - i]
73+
74+
arr = np.arange(blocksize).astype(np.float32)
75+
print(arr)
76+
77+
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
78+
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
79+
device = dpctl.select_default_device()
80+
print("Using device ...")
81+
device.print_device_info()
82+
83+
with dpctl.device_context(device):
84+
reverse_array[blocksize, dpex.DEFAULT_LOCAL_SIZE](arr)
85+
86+
# the output should be `orig[::-1] + orig, i.e. [9, 9, 9, ...]``
87+
print(arr)
88+
89+
90+
def main():
91+
no_arg_barrier_support()
92+
local_memory()
93+
94+
print("Done...")
95+
96+
97+
if __name__ == "__main__":
98+
main()
99+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
##==============================================================
2+
## Copyright © Intel Corporation
3+
##
4+
## SPDX-License-Identifier: Apache-2.0
5+
## =============================================================
6+
import dpctl
7+
import numpy as np
8+
from numba import float32
9+
10+
import numba_dpex as dpex
11+
12+
13+
def private_memory():
14+
"""
15+
This example demonstrates the usage of numba_dpex's `private.array`
16+
intrinsic function. The function is used to create a static array
17+
allocated on the devices private address space.
18+
"""
19+
20+
@dpex.kernel
21+
def private_memory_kernel(A):
22+
memory = dpex.private.array(shape=1, dtype=np.float32)
23+
i = dpex.get_global_id(0)
24+
25+
# preload
26+
memory[0] = i
27+
dpex.barrier(dpex.CLK_LOCAL_MEM_FENCE) # local mem fence
28+
29+
# memory will not hold correct deterministic result if it is not
30+
# private to each thread.
31+
A[i] = memory[0] * 2
32+
33+
N = 4
34+
arr = np.zeros(N).astype(np.float32)
35+
orig = np.arange(N).astype(np.float32)
36+
37+
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
38+
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
39+
device = dpctl.select_default_device()
40+
print("Using device ...")
41+
device.print_device_info()
42+
43+
with dpex.offload_to_sycl_device(device):
44+
private_memory_kernel[N, N](arr)
45+
46+
#np.testing.assert_allclose(orig * 2, arr)
47+
# the output should be `orig[i] * 2, i.e. [0, 2, 4, ..]``
48+
print(arr)
49+
50+
51+
def main():
52+
private_memory()
53+
print("Done...")
54+
55+
if __name__ == "__main__":
56+
main()
57+

0 commit comments

Comments
 (0)