You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restructured to match the new readme template—more or less. Changed sample name to match the name in sample.json file. Restructured sections to increase clarity. Clarified information on running sample in devcloud. Rewrote some text to improve clarity. Updated branding based on names in database.
This sample code shows how to train and predict with a distributed k-means model using the python API package daal4py powered by the oneAPI Data Analytics Library. It assumes you have a working version of the Intel MPI library installed, and it demonstrates how to use software products that are powered by [oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and found in [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
| What you will learn | distributed K-Means daal4py programming model for Intel CPU
10
-
| Time to complete | 5 minutes
3
+
This sample code shows how to train and predict with a distributed k-means model using the Python API package Daal4py powered by the Intel® oneAPI Data Analytics Library (oneDAL).
4
+
5
+
| Area| Description
6
+
|:--- |:---
7
+
| What you will learn| How to use distributed K-Means Daal4py programming model for CPUs
8
+
| Time to complete | 5 minutes
9
+
| Category | Code Optimization
11
10
12
11
## Purpose
13
12
14
-
daal4py is a simplified API to Intel® oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® oneDAL for direct usage or integration into one's own framework.
13
+
Daal4py is a simplified API to oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning developers. The sample is intended to provide an abstraction to oneDAL for direct usage or integration into your development framework.
14
+
15
+
In this sample, you will run a distributed K-Means model with oneDAL Daal4py library memory objects. You will also learn how to train a model and save the information to a file.
16
+
17
+
## Prerequisites
18
+
19
+
| Optimized for | Description
20
+
|:--- |:---
21
+
| OS | Ubuntu* 18.04 or higher
22
+
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
23
+
| Software | Intel® AI Analytics Toolkit (AI Kit)
15
24
16
-
In this sample, you will run a distributed K-Means model with oneDAL daal4py library memory objects. You will also learn how to train a model and save the information to a file.
25
+
The sample assumes you have a working version of the Intel® MPI Library, Daal4py, and scikit-learn installed inside a conda environment (similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the AI Kit.)
17
26
18
27
## Key Implementation Details
19
-
This distributed K-means sample code is implemented for CPU using the Python language. The example assumes you have daal4py and scikit-learn installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).
20
28
21
-
## License
22
-
Code samples are licensed under the MIT license. See
23
-
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
29
+
This distributed K-means sample code is designed to run on **CPUs**, and is written in Python.
30
+
31
+
The sample demonstrates how to use software products that are powered by [Intel® oneAPI Data Analytics Library (oneDAL)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/en-us/oneapi/ai-kit).
24
32
25
-
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
33
+
## Set Environment Variables
26
34
27
-
## Running Samples on the Intel® DevCloud
28
-
If you are running this sample on the DevCloud, see [Running Samples on the Intel® DevCloud](#run-samples-on-devcloud)
35
+
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
29
36
30
-
## Building daal4py for CPU
37
+
## Build the `Intel® Python Daal4py Distributed K-Means` Sample
31
38
32
-
The Intel(R) oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
39
+
The Intel® oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
33
40
34
41
You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html) for post-installation steps and scripts.
35
42
43
+
### On Linux*
44
+
36
45
> **Note**: If you have not already done so, set up your CLI
37
-
> environment by sourcing the `setvars` script located in
38
-
> the root of your oneAPI installation.
39
-
>
40
-
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
46
+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
41
47
>
42
-
> Linux User: . ~/intel/oneapi/setvars.sh
48
+
> Linux*:
49
+
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
50
+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
51
+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
53
+
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
47
54
48
-
### Activate conda environment With Root Access
55
+
####Activate Conda with Root Access
49
56
50
-
Intel Python environment will be active by default. However, if you activated another environment, you can return with the following command:
57
+
By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. However, if you activated another environment, you can return with the following command.
51
58
52
-
#### On a Linux* System
53
-
```
54
-
source activate base
55
-
```
59
+
```
60
+
source activate base
61
+
```
56
62
57
-
### Activate conda environment Without Root Access (Optional)
63
+
####Activate Conda without Root Access (Optional)
58
64
59
-
By default, the Intel® AI Analytics toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
65
+
You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
60
66
61
-
#### On a Linux* System
62
67
```
63
68
conda create --name usr_intelpython --clone base
64
-
```
65
-
66
-
Then activate your conda environment with the following command:
67
-
68
-
```
69
69
source activate usr_intelpython
70
70
```
71
71
72
-
### Install Jupyter Notebook
73
-
```
74
-
conda install jupyter nb_conda_kernels
75
-
```
76
-
72
+
#### Run the Python Script
77
73
78
-
#### View in Jupyter Notebook
74
+
When using Daal4py for distributed memory systems, you must execute the program in a bash shell.
79
75
80
-
_Note: This distributed execution cannot be launched from the jupyter notebook version, but you can still view inside the notebook to follow the included write-up and description._
76
+
1. Run the script with a command similar to the following command. (The number **4** is an example and indicates that the script will run on **4 processes**.)
>**Note**: This code sample uses Daal4py to perform distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the primary node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well.
81
81
82
-
Launch Jupyter Notebook in the directory housing the code example
82
+
When it completes, the script output will be in the included **/models** and **/results** directories.
83
83
84
-
```
85
-
jupyter notebook
86
-
```
84
+
#### Jupyter Notebook (Optional)
87
85
88
-
### Running the Sample as a Python File<aname="running-the-sample"></a>
86
+
>**Note**: This sample cannot be launched from the Jupyter Notebook version; however, you can still view inside the notebook to follow the included write-up and description.
89
87
90
-
When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell. To execute this example, run the following command, where the number **4** is chosen as an example and means that it will run on **4 processes**:
88
+
1. If you have not already done so, install Jupyter Notebook.
The output of the script will be saved in the included models and result directories.
103
+
#### Troubleshooting
97
104
98
-
_Note: This code samples focus on using daal4py to do distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the master node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well._
105
+
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility.
99
106
100
-
### Using Visual Studio Code* (VS Code)
107
+
### Build and Run the Sample on Intel® DevCloud (Optional)
101
108
102
-
You can use VS Code extensions to set your environment, create launch configurations,
103
-
and browse and download samples.
109
+
>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
104
110
105
-
The basic steps to build and run a sample using VS Code include:
106
-
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
107
-
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
108
-
- Open a Terminal in VS Code (**Terminal>New Terminal**).
109
-
- Run the sample in the VS Code terminal using the instructions below.
111
+
1. Open a terminal on a Linux* system.
112
+
2. Log in to the Intel® DevCloud.
113
+
```
114
+
ssh devcloud
115
+
```
116
+
3. If the sample is not already available, download the samples from GitHub.
To learn more about the extensions and how to configure the oneAPI environment, see
112
-
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
123
+
The following example is for a CPU node. (This is a single line script.)
124
+
```
125
+
qsub -I -l nodes=1:cpu:ppn=2 -d .
126
+
```
127
+
-`-I` (upper case I) requests an interactive session.
128
+
-`-l nodes=1:cpu:ppn=2` (lower case L) assigns one full GPU node.
129
+
-`-d .` makes the current folder as the working directory for the task.
113
130
114
-
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
131
+
>**Note**: For more information about the node properties, execute the `pbsnodes` command.
115
132
116
-
## Running Samples on the Intel® DevCloud (Optional)<aname="run-samples-on-devcloud"></a>
133
+
6. Perform build steps you would on Linux.
134
+
7. Run the sample.
117
135
118
-
<!---Include the next paragraph ONLY if the sample runs in batch mode-->
119
-
### Run in Batch Mode
120
-
This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
136
+
> **Note**: To inspect job progress if you are using a script, use the qstat utility.
137
+
> ```
138
+
> watch -n 1 qstat -n -1
139
+
> ```
140
+
> The command displays the results every second. The job is complete when no new results display.
121
141
122
-
### Request a Compute Node
123
-
In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
124
-
This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
Here is our cluster assignments for first 5 datapoints:
152
164
153
165
[[1]
@@ -156,7 +168,11 @@ Here is our cluster assignments for first 5 datapoints:
156
168
[1]
157
169
[1]]
158
170
[CODE_SAMPLE_COMPLETED_SUCCESFULLY]
159
-
160
171
```
161
172
173
+
## License
174
+
175
+
Code samples are licensed under the MIT license. See
176
+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
162
177
178
+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
0 commit comments