You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AI-and-Analytics/Features-and-Functionality/IntelPython_Numpy_Numba_dpnp_kNN/IntelPython_Numpy_Numba_dpnp_kNN.ipynb
+3-3
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@
19
19
"source": [
20
20
"# Simple k-NN classification with Data Parallel Extension for NumPy IDP optimization\n",
21
21
"\n",
22
-
"This sample shows how to receive the same accuracy of the k-NN model classification by using numpy, numba and numba_dpex. The computation are performed using wine dataset.\n",
22
+
"This sample shows how to receive the same accuracy of the k-NN model classification by using numpy, numba and dpnp. The computation are performed using wine dataset.\n",
23
23
"\n",
24
24
"Let's start with general imports used in the whole sample."
25
25
]
@@ -73,7 +73,7 @@
73
73
"cell_type": "markdown",
74
74
"metadata": {},
75
75
"source": [
76
-
"We are planning to compare the results of the numpy, namba and IDP numba_dpex so we need to make sure that the results are reproducible. We can do this through the use of a random seed function that initializes a random number generator."
76
+
"We are planning to compare the results of the numpy, namba and IDP dpnp so we need to make sure that the results are reproducible. We can do this through the use of a random seed function that initializes a random number generator."
77
77
]
78
78
},
79
79
{
@@ -370,7 +370,7 @@
370
370
"cell_type": "markdown",
371
371
"metadata": {},
372
372
"source": [
373
-
"Like before, let's measure the accuracy of the prepared implementation. It is measured as the number of well-assigned classes for the test set. The final result is the same for all: NumPy, numba and numba-dpex implementations."
373
+
"Like before, let's measure the accuracy of the prepared implementation. It is measured as the number of well-assigned classes for the test set. The final result is the same for all: NumPy, numba and dpnp implementations."
Copy file name to clipboardExpand all lines: AI-and-Analytics/Features-and-Functionality/IntelPython_Numpy_Numba_dpnp_kNN/IntelPython_Numpy_Numba_dpnp_kNN.py
# # Simple k-NN classification with numba_dpex IDP optimization
15
-
#
16
-
# This sample shows how to receive the same accuracy of the k-NN model classification by using numpy, numba and numba_dpex. The computation are performed using wine dataset.
17
-
#
14
+
# # Simple k-NN classification with Data Parallel Extension for NumPy IDP optimization
15
+
#
16
+
# This sample shows how to receive the same accuracy of the k-NN model classification by using numpy, numba and dpnp. The computation are performed using wine dataset.
17
+
#
18
18
# Let's start with general imports used in the whole sample.
19
19
20
20
# In[ ]:
@@ -27,11 +27,11 @@
27
27
28
28
29
29
# ## Data preparation
30
-
#
30
+
#
31
31
# Then, let's download the dataset and prepare it for future computations.
32
-
#
32
+
#
33
33
# We are using the wine dataset available in the sci-kit learn library. For our purposes, we will be using only 2 features: alcohol and malic_acid.
34
-
#
34
+
#
35
35
# So first we need to load the dataset and create DataFrame from it. Later we will limit the DataFrame to just target and 2 classes we choose for this problem.
36
36
37
37
# In[ ]:
@@ -51,7 +51,7 @@
51
51
df.head()
52
52
53
53
54
-
# We are planning to compare the results of the numpy, namba and IDP numba_dpex so we need to make sure that the results are reproducible. We can do this through the use of a random seed function that initializes a random number generator.
54
+
# We are planning to compare the results of the numpy, namba and IDP dpnp so we need to make sure that the results are reproducible. We can do this through the use of a random seed function that initializes a random number generator.
55
55
56
56
# In[ ]:
57
57
@@ -60,7 +60,7 @@
60
60
61
61
62
62
# The next step is to prepare the dataset for training and testing. To do this, we randomly divided the downloaded wine dataset into a training set (containing 90% of the data) and a test set (containing 10% of the data).
63
-
#
63
+
#
64
64
# In addition, we take from both sets (training and test) data *X* (features) and label *y* (target).
65
65
66
66
# In[ ]:
@@ -78,9 +78,9 @@
78
78
79
79
80
80
# ## NumPy k-NN
81
-
#
81
+
#
82
82
# Now, it's time to implement the first version of k-NN function using NumPy.
83
-
#
83
+
#
84
84
# First, let's create simple euclidean distance function. We are taking positions form the provided vectors, counting the squares of the individual differences between the positions, and then drawing the root of their sum for the whole vectors (remember that the vectors must be of equal length).
85
85
86
86
# In[ ]:
@@ -93,7 +93,7 @@ def distance(vector1, vector2):
93
93
94
94
95
95
# Then, the k-nearest neighbors algorithm itself.
96
-
#
96
+
#
97
97
# 1. We are starting by defining a container for predictions the same size as a test set.
98
98
# 2. Then, for each row in the test set, we calculate distances between then and every training record.
99
99
# 3. We are sorting training datasets based on calculated distances
# Now, let's move to the numba implementation of the k-NN algorithm. We will start the same, by defining the distance function and importing the necessary packages.
150
-
#
150
+
#
151
151
# For numba implementation, we are using the core functionality which is `numba.jit()` decorator.
152
-
#
152
+
#
153
153
# We are starting with defining the distance function. Like before it is a euclidean distance. For additional optimization we are using `np.linalg.norm`.
# Numba_dpex implementation use `numba_dpex.kernel()` decorator. For more information about programming, SYCL kernels go to: https://intelpython.github.io/numba-dpex/latest/user_guides/kernel_programming_guide/index.html.
228
-
#
229
-
# Calculating distance is like in the NumPy example. We are using Euclidean distance. Later, we create the queue of the neighbors by the calculated distance and count in provided *k* votes for dedicated classes of neighbors.
230
-
#
231
-
# In the end, we are taking a class that achieves the maximum value of votes and setting it for the current global iteration.
225
+
# ## Data Parallel Extension for NumPy k-NN
226
+
#
227
+
# To take benefit of DPNP, we can leverage its vectorized operations and efficient algorithms to implement a k-NN algorithm. We will use optimized operations like `sum`, `sqrt` or `argsort`.
228
+
#
229
+
# Calculating distance is like in the NumPy example. We are using Euclidean distance. The next step is to find the indexes of k-nearest neighbours for each test poin, and get tehir labels. At the end, we neet to determine the most frequent label among k-nearest.
# Next, like before, let's test the prepared k-NN function.
321
-
#
322
-
# In this case, we will need to provide the container for predictions: `predictions` and the container for votes per class: `votes_to_classes_lst` (the container size is 3, as we have 3 classes in our dataset).
323
-
#
324
-
# We are running a prepared k-NN function on a CPU device as the input data was allocated on the CPU. Numba-dpex will infer the execution queue based on where the input arguments to the kernel were allocated. Refer: https://intelpython.github.io/oneAPI-for-SciPy/details/programming_model/#compute-follows-data
325
-
# In[ ]:
326
-
254
+
#
255
+
# We are running a prepared k-NN function on a CPU device as the input data was allocated on the CPU using DPNP.
# Like before, let's measure the accuracy of the prepared implementation. It is measured as the number of well-assigned classes for the test set. The final result is the same for all: NumPy, numba and numba-dpex implementations.
267
+
# Like before, let's measure the accuracy of the prepared implementation. It is measured as the number of well-assigned classes for the test set. The final result is the same for all: NumPy, numba and dpnp implementations.
351
268
352
269
# In[ ]:
353
270
354
271
355
272
predictions_numba=dpnp.asnumpy(predictions)
356
273
true_values=y_test.to_numpy()
357
274
accuracy=np.mean(predictions_numba==true_values)
358
-
print("Numba_dpex accuracy:", accuracy)
275
+
print("Data Parallel Extension for NumPy accuracy:", accuracy)
0 commit comments