You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ai and analytics features and functionality intel python daal4py distributed linear regression (#1417)
* Fixes for 2023.1 AI Kit (#1409)
* Intel Python Numpy Numba_dpes kNN sample (#1292)
* *.py and *.ipynb files with implementation
* README.md and sample.json files with documentation
* License and thir party programs
* Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample (#1293)
* add IntelPytorch Quantization code samples (#1301)
* add IntelPytorch Quantization code samples
* fix the spelling error in the README file
* use john's README with grammar fix and title change
* Rename third-party-grograms.txt to third-party-programs.txt
Co-authored-by: Jimmy Wei <[email protected]>
* AMX bfloat16 mixed precision learning TensorFlow Transformer sample (#1317)
* [New Sample] Intel Extension for TensorFlow Getting Started (#1313)
* first draft
* Update README.md
* remove redunant file
* [New Sample] [oneDNN] Benchdnn tutorial (#1315)
* New Sample: benchDNN tutorial
* Update readme: new sample
* Rename sample to benchdnn_tutorial
* Name fix
* Add files via upload (#1320)
* [New Sample] oneCCL Bindings for PyTorch Getting Started (#1316)
* Update README.md
* [New Sample] oneCCL Bindings for PyTorch Getting Started
* Update README.md
* add torch-ccl version check
* [New Sample] Intel Extension for PyTorch Getting Started (#1314)
* add new ipex GSG notebook for dGPU
* Update sample.json
for expertise field
* Update requirements.txt
Update package versions to comply with Snyk tool
* Updated title field in sample.json in TF Transformer AMX bfloat16 Mixed Precision sample to fit within character length range (#1327)
* add arch checker class (#1332)
* change gpu.patch to convert the code samples from cpu to gpu correctly (#1334)
* Fixes for spelling in AMX bfloat16 transformer sample and printing error in python code in numpy vs numba sample (#1335)
* 2023.1 ai kit itex get started example fix (#1338)
* Fix the typo
* Update ResNet50_Inference.ipynb
* fix resnet inference demo link (#1339)
* Fix printing issue in numpy vs numba AI sample (#1356)
* Fix Invalid Kmeans parameters on oneAPI 2023 (#1345)
* Update README to add new samples into the list (#1366)
* PyTorch AMX BF16 Training sample: remove graphs and performance numbers (#1408)
* Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample
* remove performance graphs, update README
* remove graphs from README and folder
* update top README in Features and Functionality
---------
Co-authored-by: krzeszew <[email protected]>
Co-authored-by: alexsin368 <[email protected]>
Co-authored-by: ZhaoqiongZ <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: Orel Yehuda <[email protected]>
Co-authored-by: yuning <[email protected]>
Co-authored-by: Wang, Kai Lawrence <[email protected]>
Co-authored-by: xiguiw <[email protected]>
* Daal4py Distributed Linear Regression readme update
Restructured to match the new readme template—more or less. Changed sample name to match the name in sample.json file. Updated the prerequisites to match the OS shown in the sample.json file. Restructured sections to increase clarity. Clarified and extended information on running sample in devcloud. Clarified and extended information about Jupyter Notebooks. Fixed formatting issues. Updated branding based on names in database.
---------
Co-authored-by: Jimmy Wei <[email protected]>
Co-authored-by: krzeszew <[email protected]>
Co-authored-by: alexsin368 <[email protected]>
Co-authored-by: ZhaoqiongZ <[email protected]>
Co-authored-by: Louie Tsai <[email protected]>
Co-authored-by: Orel Yehuda <[email protected]>
Co-authored-by: yuning <[email protected]>
Co-authored-by: Wang, Kai Lawrence <[email protected]>
Co-authored-by: xiguiw <[email protected]>
Copy file name to clipboardExpand all lines: AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb
+5-3
Original file line number
Diff line number
Diff line change
@@ -311,6 +311,7 @@
311
311
},
312
312
{
313
313
"cell_type": "markdown",
314
+
"id": "5eea6ae7",
314
315
"metadata": {},
315
316
"source": [
316
317
"The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset. "
@@ -348,15 +349,16 @@
348
349
"id": "b6ea2aeb",
349
350
"metadata": {},
350
351
"source": [
351
-
"This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. "
352
+
"This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."
352
353
]
353
354
},
354
355
{
355
356
"cell_type": "markdown",
356
-
"id": "0da073a6",
357
+
"id": "7bf01080",
357
358
"metadata": {},
358
359
"source": [
359
-
"This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."
360
+
"## Conclusion\n",
361
+
"This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
Copy file name to clipboardExpand all lines: AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md
+2-4
Original file line number
Diff line number
Diff line change
@@ -148,11 +148,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
148
148
149
149
## Example Output
150
150
151
-
If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample generates performance and analysis diagrams for comparison.
151
+
If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline.
152
152
153
-
The following image shows approximate performance speed increases using AMX BF16 with auto-mixed precision during training. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.
The performance speedups using AMX BF16 are approximate on ResNet50. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html).
# `Intel Python daal4py Distributed Linear Regression Sample`
1
+
# `Intel® Python Daal4py Distributed Linear Regression` Sample
2
2
3
-
This sample code shows how to train and predict with a distributed linear regression model using the python API package daal4py powered by the oneAPI Data Analytics Library. It assumes you have a working version of the Intel® MPI Library installed, and it demonstrates how to use software products that is powered by the [oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and found in the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
3
+
This sample demonstrates how to train and predict with a distributed linear regression model using the Python API package Daal4py powered by the Intel® oneAPI Data Analytics Library (oneDAL).
4
4
5
-
| Optimized for | Description
6
-
| :--- | :---
7
-
| OS | 64-bit Linux: Ubuntu 18.04 or higher, 64-bit Windows 10, macOS 10.14 or higher
| What you will learn | distributed daal4py Linear Regression programming model for Intel CPU
11
-
| Time to complete | 5 minutes
5
+
| Area | Description
6
+
|:--- |:---
7
+
| What you will learn | How to use distributed Daal4py Linear Regression programming model for Intel CPUs
8
+
| Time to complete | 5 minutes
9
+
| Category | Concepts and Functionality
12
10
13
11
## Purpose
14
12
15
-
daal4py is a simplified API to Intel® oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® oneDAL for direct usage or integration into one's own framework.
13
+
Daal4py is a simplified API to oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning developers. The sample is intended to provide abstraction to Intel® oneDAL for direct usage or integration your development framework.
16
14
17
-
In this sample, you will run a distributed Linear Regression model with oneDAL daal4py library memory objects. You will also learn how to train a model and save the information to a file.
15
+
In this sample, you will run a distributed Linear Regression model with oneDAL Daal4py library memory objects. You will also learn how to train a model and save the information to a file.
16
+
17
+
## Prerequisites
18
+
19
+
| Optimized for | Description
20
+
|:--- |:---
21
+
| OS | Ubuntu* 18.04 or higher
22
+
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
23
+
| Software | Intel® AI Analytics Toolkit (AI Kit)
18
24
19
25
## Key Implementation Details
20
-
This distributed linear regression sample code is implemented for the CPU using the Python language. The example assumes you have daal4py and scikit-learn installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).
21
26
22
-
## License
23
-
Code samples are licensed under the MIT license. See
The sample demonstrates how to use software products that are powered by [Intel® oneAPI Data Analytics Library (oneDAL)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/en-us/oneapi/ai-kit).
28
+
29
+
The sample assumes you have a working version of the Intel® MPI Library, Daal4py, and scikit-learn installed inside a conda environment (similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the AI Kit.)
25
30
26
-
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
31
+
## Set Environment Variables
27
32
28
-
## Running Samples on the Intel® DevCloud
29
-
If you are running this sample on the DevCloud, see [Running Samples on the Intel® DevCloud](#run-samples-on-devcloud)
33
+
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
30
34
31
-
## Building daal4py for CPU
35
+
## Build the `Intel® Python Daal4py Distributed Linear Regression` Sample
32
36
33
-
oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the postinstallation script.
37
+
You can refer to the *[Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html)* for post-installation steps and scripts.
34
38
35
-
You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
39
+
The Intel® oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run thepostinstallation script.
36
40
37
41
> **Note**: If you have not already done so, set up your CLI
38
-
> environment by sourcing the `setvars` script located in
39
-
> the root of your oneAPI installation.
40
-
>
41
-
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
42
+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
42
43
>
43
-
> Linux User: . ~/intel/oneapi/setvars.sh
44
+
> Linux*:
45
+
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
46
+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
47
+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
49
+
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
50
+
51
+
### On Linux*
48
52
49
-
### Activate conda environment With Root Access
53
+
####Activate Conda with Root Access
50
54
51
-
Intel Python environment will be active by default. However, if you activated another environment, you can return with the following command:
55
+
By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. However, if you activated another environment, you can return with the following command.
52
56
53
-
#### On a Linux* System
54
57
```
55
58
source activate base
56
59
```
57
60
58
-
### Activate conda environment Without Root Access (Optional)
61
+
####Activate Conda without Root Access (Optional)
59
62
60
-
By default, the Intel® AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
63
+
You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
61
64
62
-
#### On a Linux* System
63
65
```
64
66
conda create --name usr_intelpython --clone base
65
-
```
66
-
67
-
Then activate your conda environment with the following command:
68
-
69
-
```
70
67
source activate usr_intelpython
71
68
```
72
69
73
-
### Install Jupyter Notebook
74
-
```
75
-
conda install jupyter nb_conda_kernels
76
-
```
70
+
#### Jupyter Notebook (Optional)
77
71
78
-
#### View in Jupyter Notebook
72
+
>**Note**: This sample cannot be launched from the Jupyter Notebook version; however, you can still view inside the notebook to follow the included write-up and description.
79
73
80
-
_Note: This distributed execution cannot be launched from the jupyter notebook version, but you can still view inside the notebook to follow the included write-up and description._
74
+
1. If you have not already done so, install Jupyter Notebook.
Launch Jupyter Notebook in the directory housing the code example
87
+
## Run the `Intel® Python Daal4py Distributed Linear Regression` Sample
83
88
84
-
```
85
-
jupyter notebook
86
-
```
89
+
### On Linux
87
90
88
-
## Running the Sample<aname="running-the-sample"></a>
91
+
When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell.
89
92
90
-
### Running the Sample as a Python File
93
+
1. Run the script with a command similar to the following command. (The number **4** is an example and indicates that the script will run on **4 processes**.)
91
94
92
-
When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell. To execute this example, run the following command, where the number **4** is chosen as an example and means that it will run on **4 processes**:
>**Note**: This code samples focus on using Daal4py for distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the primary node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well.
97
102
98
-
The output of the script will be saved in the included models and result directories.
103
+
#### Troubleshooting
99
104
100
-
_Note: This code samples focus on using daal4py to do distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the master node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well._
105
+
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility.
101
106
102
-
### Using Visual Studio Code* (VS Code)
107
+
### Build and Run the Sample on Intel® DevCloud (Optional)
103
108
104
-
You can use VS Code extensions to set your environment, create launch configurations,
105
-
and browse and download samples.
109
+
>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
106
110
107
-
The basic steps to build and run a sample using VS Code include:
108
-
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
109
-
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
110
-
- Open a Terminal in VS Code (**Terminal>New Terminal**).
111
-
- Run the sample in the VS Code terminal using the instructions below.
111
+
1. Open a terminal on a Linux* system.
112
+
2. Log in to the Intel® DevCloud.
113
+
```
114
+
ssh devcloud
115
+
```
116
+
3. If the sample is not already available, download the samples from GitHub.
To learn more about the extensions and how to configure the oneAPI environment, see
114
-
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
123
+
The following example is for a CPU node. (This is a single line script.)
124
+
```
125
+
qsub -I -l nodes=1:cpu:ppn=2 -d .
126
+
```
127
+
-`-I` (upper case I) requests an interactive session.
128
+
-`-l nodes=1:cpu:ppn=2` (lower case L) assigns one full GPU node.
129
+
-`-d .` makes the current folder as the working directory for the task.
115
130
116
-
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
131
+
>**Note**: For more information about the node properties, execute the `pbsnodes` command.
117
132
118
-
### Running Samples on the Intel® DevCloud (Optional)<aname="run-samples-on-devcloud"></a>
133
+
6. Perform build steps you would on Linux.
134
+
7. Run the sample.
119
135
120
-
<!---Include the next paragraph ONLY if the sample runs in batch mode-->
121
-
### Run in Batch Mode
122
-
This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
136
+
> **Note**: To inspect job progress if you are using a script, use the qstat utility.
137
+
> ```
138
+
> watch -n 1 qstat -n -1
139
+
> ```
140
+
> The command displays the results every second. The job is complete when no new results display.
123
141
124
-
### Request a Compute Node
125
-
In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
126
-
This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
Code samples are licensed under the MIT license. See
182
+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
183
+
184
+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
0 commit comments