Skip to content

Commit d094492

Browse files
jkinskyjimmytweikrzeszewalexsin368ZhaoqiongZ
authored
Ai and analytics features and functionality intel python daal4py distributed linear regression (#1417)
* Fixes for 2023.1 AI Kit (#1409) * Intel Python Numpy Numba_dpes kNN sample (#1292) * *.py and *.ipynb files with implementation * README.md and sample.json files with documentation * License and thir party programs * Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample (#1293) * add IntelPytorch Quantization code samples (#1301) * add IntelPytorch Quantization code samples * fix the spelling error in the README file * use john's README with grammar fix and title change * Rename third-party-grograms.txt to third-party-programs.txt Co-authored-by: Jimmy Wei <[email protected]> * AMX bfloat16 mixed precision learning TensorFlow Transformer sample (#1317) * [New Sample] Intel Extension for TensorFlow Getting Started (#1313) * first draft * Update README.md * remove redunant file * [New Sample] [oneDNN] Benchdnn tutorial (#1315) * New Sample: benchDNN tutorial * Update readme: new sample * Rename sample to benchdnn_tutorial * Name fix * Add files via upload (#1320) * [New Sample] oneCCL Bindings for PyTorch Getting Started (#1316) * Update README.md * [New Sample] oneCCL Bindings for PyTorch Getting Started * Update README.md * add torch-ccl version check * [New Sample] Intel Extension for PyTorch Getting Started (#1314) * add new ipex GSG notebook for dGPU * Update sample.json for expertise field * Update requirements.txt Update package versions to comply with Snyk tool * Updated title field in sample.json in TF Transformer AMX bfloat16 Mixed Precision sample to fit within character length range (#1327) * add arch checker class (#1332) * change gpu.patch to convert the code samples from cpu to gpu correctly (#1334) * Fixes for spelling in AMX bfloat16 transformer sample and printing error in python code in numpy vs numba sample (#1335) * 2023.1 ai kit itex get started example fix (#1338) * Fix the typo * Update ResNet50_Inference.ipynb * fix resnet inference demo link (#1339) * Fix printing issue in numpy vs numba AI sample (#1356) * Fix Invalid Kmeans parameters on oneAPI 2023 (#1345) * Update README to add new samples into the list (#1366) * PyTorch AMX BF16 Training sample: remove graphs and performance numbers (#1408) * Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample * remove performance graphs, update README * remove graphs from README and folder * update top README in Features and Functionality --------- Co-authored-by: krzeszew <[email protected]> Co-authored-by: alexsin368 <[email protected]> Co-authored-by: ZhaoqiongZ <[email protected]> Co-authored-by: Louie Tsai <[email protected]> Co-authored-by: Orel Yehuda <[email protected]> Co-authored-by: yuning <[email protected]> Co-authored-by: Wang, Kai Lawrence <[email protected]> Co-authored-by: xiguiw <[email protected]> * Daal4py Distributed Linear Regression readme update Restructured to match the new readme template—more or less. Changed sample name to match the name in sample.json file. Updated the prerequisites to match the OS shown in the sample.json file. Restructured sections to increase clarity. Clarified and extended information on running sample in devcloud. Clarified and extended information about Jupyter Notebooks. Fixed formatting issues. Updated branding based on names in database. --------- Co-authored-by: Jimmy Wei <[email protected]> Co-authored-by: krzeszew <[email protected]> Co-authored-by: alexsin368 <[email protected]> Co-authored-by: ZhaoqiongZ <[email protected]> Co-authored-by: Louie Tsai <[email protected]> Co-authored-by: Orel Yehuda <[email protected]> Co-authored-by: yuning <[email protected]> Co-authored-by: Wang, Kai Lawrence <[email protected]> Co-authored-by: xiguiw <[email protected]>
1 parent 8fca86f commit d094492

File tree

4 files changed

+111
-95
lines changed

4 files changed

+111
-95
lines changed

AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb

+5-3
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@
311311
},
312312
{
313313
"cell_type": "markdown",
314+
"id": "5eea6ae7",
314315
"metadata": {},
315316
"source": [
316317
"The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset. "
@@ -348,15 +349,16 @@
348349
"id": "b6ea2aeb",
349350
"metadata": {},
350351
"source": [
351-
"This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. "
352+
"This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."
352353
]
353354
},
354355
{
355356
"cell_type": "markdown",
356-
"id": "0da073a6",
357+
"id": "7bf01080",
357358
"metadata": {},
358359
"source": [
359-
"This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."
360+
"## Conclusion\n",
361+
"This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
360362
]
361363
},
362364
{

AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md

+2-4
Original file line numberDiff line numberDiff line change
@@ -148,11 +148,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
148148
149149
## Example Output
150150
151-
If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample generates performance and analysis diagrams for comparison.
151+
If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline.
152152
153-
The following image shows approximate performance speed increases using AMX BF16 with auto-mixed precision during training. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.
154-
155-
![comparison images](assets/amx_relative_speedup.png)
153+
The performance speedups using AMX BF16 are approximate on ResNet50. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html).
156154
157155
## License
158156
Original file line numberDiff line numberDiff line change
@@ -1,147 +1,158 @@
1-
# `Intel Python daal4py Distributed Linear Regression Sample`
1+
# `Intel® Python Daal4py Distributed Linear Regression` Sample
22

3-
This sample code shows how to train and predict with a distributed linear regression model using the python API package daal4py powered by the oneAPI Data Analytics Library. It assumes you have a working version of the Intel® MPI Library installed, and it demonstrates how to use software products that is powered by the [oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and found in the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
3+
This sample demonstrates how to train and predict with a distributed linear regression model using the Python API package Daal4py powered by the Intel® oneAPI Data Analytics Library (oneDAL).
44

5-
| Optimized for | Description
6-
| :--- | :---
7-
| OS | 64-bit Linux: Ubuntu 18.04 or higher, 64-bit Windows 10, macOS 10.14 or higher
8-
| Hardware | Intel Atom® Processors; Intel® Core™ Processor Family; Intel® Xeon® Processor Family; Intel® Xeon® Scalable processor family
9-
| Software | Intel® AI Analytics Toolkit
10-
| What you will learn | distributed daal4py Linear Regression programming model for Intel CPU
11-
| Time to complete | 5 minutes
5+
| Area | Description
6+
|:--- |:---
7+
| What you will learn | How to use distributed Daal4py Linear Regression programming model for Intel CPUs
8+
| Time to complete | 5 minutes
9+
| Category | Concepts and Functionality
1210

1311
## Purpose
1412

15-
daal4py is a simplified API to Intel® oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® oneDAL for direct usage or integration into one's own framework.
13+
Daal4py is a simplified API to oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning developers. The sample is intended to provide abstraction to Intel® oneDAL for direct usage or integration your development framework.
1614

17-
In this sample, you will run a distributed Linear Regression model with oneDAL daal4py library memory objects. You will also learn how to train a model and save the information to a file.
15+
In this sample, you will run a distributed Linear Regression model with oneDAL Daal4py library memory objects. You will also learn how to train a model and save the information to a file.
16+
17+
## Prerequisites
18+
19+
| Optimized for | Description
20+
|:--- |:---
21+
| OS | Ubuntu* 18.04 or higher
22+
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
23+
| Software | Intel® AI Analytics Toolkit (AI Kit)
1824

1925
## Key Implementation Details
20-
This distributed linear regression sample code is implemented for the CPU using the Python language. The example assumes you have daal4py and scikit-learn installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).
2126

22-
## License
23-
Code samples are licensed under the MIT license. See
24-
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
27+
The sample demonstrates how to use software products that are powered by [Intel® oneAPI Data Analytics Library (oneDAL)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/en-us/oneapi/ai-kit).
28+
29+
The sample assumes you have a working version of the Intel® MPI Library, Daal4py, and scikit-learn installed inside a conda environment (similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the AI Kit.)
2530

26-
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
31+
## Set Environment Variables
2732

28-
## Running Samples on the Intel&reg; DevCloud
29-
If you are running this sample on the DevCloud, see [Running Samples on the Intel&reg; DevCloud](#run-samples-on-devcloud)
33+
When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
3034

31-
## Building daal4py for CPU
35+
## Build the `Intel® Python Daal4py Distributed Linear Regression` Sample
3236

33-
oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
37+
You can refer to the *[Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html)* for post-installation steps and scripts.
3438

35-
You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
39+
The Intel® oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
3640

3741
> **Note**: If you have not already done so, set up your CLI
38-
> environment by sourcing the `setvars` script located in
39-
> the root of your oneAPI installation.
40-
>
41-
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
42+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
4243
>
43-
> Linux User: . ~/intel/oneapi/setvars.sh
44+
> Linux*:
45+
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
46+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
47+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
4448
>
45-
> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
46-
>
47-
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
49+
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
50+
51+
### On Linux*
4852

49-
### Activate conda environment With Root Access
53+
#### Activate Conda with Root Access
5054

51-
Intel Python environment will be active by default. However, if you activated another environment, you can return with the following command:
55+
By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. However, if you activated another environment, you can return with the following command.
5256

53-
#### On a Linux* System
5457
```
5558
source activate base
5659
```
5760

58-
### Activate conda environment Without Root Access (Optional)
61+
#### Activate Conda without Root Access (Optional)
5962

60-
By default, the Intel® AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
63+
You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
6164

62-
#### On a Linux* System
6365
```
6466
conda create --name usr_intelpython --clone base
65-
```
66-
67-
Then activate your conda environment with the following command:
68-
69-
```
7067
source activate usr_intelpython
7168
```
7269

73-
### Install Jupyter Notebook
74-
```
75-
conda install jupyter nb_conda_kernels
76-
```
70+
#### Jupyter Notebook (Optional)
7771

78-
#### View in Jupyter Notebook
72+
>**Note**: This sample cannot be launched from the Jupyter Notebook version; however, you can still view inside the notebook to follow the included write-up and description.
7973
80-
_Note: This distributed execution cannot be launched from the jupyter notebook version, but you can still view inside the notebook to follow the included write-up and description._
74+
1. If you have not already done so, install Jupyter Notebook.
75+
```
76+
conda install jupyter nb_conda_kernels
77+
```
78+
2. Launch Jupyter Notebook.
79+
```
80+
jupyter notebook
81+
```
82+
3. Locate and select the Notebook.
83+
```
84+
IntelPython_daal4py_Distributed_LinearRegression.ipynb
85+
```
8186

82-
Launch Jupyter Notebook in the directory housing the code example
87+
## Run the `Intel® Python Daal4py Distributed Linear Regression` Sample
8388

84-
```
85-
jupyter notebook
86-
```
89+
### On Linux
8790

88-
## Running the Sample<a name="running-the-sample"></a>
91+
When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell.
8992

90-
### Running the Sample as a Python File
93+
1. Run the script with a command similar to the following command. (The number **4** is an example and indicates that the script will run on **4 processes**.)
9194

92-
When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell. To execute this example, run the following command, where the number **4** is chosen as an example and means that it will run on **4 processes**:
95+
```
96+
mpirun -n 4 python ./IntelPython_daal4py_Distributed_LinearRegression.py
97+
```
9398

94-
Run the Program
99+
When it completes, the script output will be in the included **/models** and **/results** directories.
95100

96-
`mpirun -n 4 python ./IntelPython_daal4py_Distributed_LinearRegression.py`
101+
>**Note**: This code samples focus on using Daal4py for distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the primary node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well.
97102
98-
The output of the script will be saved in the included models and result directories.
103+
#### Troubleshooting
99104

100-
_Note: This code samples focus on using daal4py to do distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the master node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well._
105+
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility.
101106

102-
### Using Visual Studio Code* (VS Code)
107+
### Build and Run the Sample on Intel® DevCloud (Optional)
103108

104-
You can use VS Code extensions to set your environment, create launch configurations,
105-
and browse and download samples.
109+
>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
106110
107-
The basic steps to build and run a sample using VS Code include:
108-
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
109-
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
110-
- Open a Terminal in VS Code (**Terminal>New Terminal**).
111-
- Run the sample in the VS Code terminal using the instructions below.
111+
1. Open a terminal on a Linux* system.
112+
2. Log in to the Intel® DevCloud.
113+
```
114+
ssh devcloud
115+
```
116+
3. If the sample is not already available, download the samples from GitHub.
117+
```
118+
git clone https://github.com/oneapi-src/oneAPI-samples.git
119+
```
120+
4. Change to the sample directory.
121+
5. Configure the sample for the appropriate node.
112122

113-
To learn more about the extensions and how to configure the oneAPI environment, see
114-
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
123+
The following example is for a CPU node. (This is a single line script.)
124+
```
125+
qsub -I -l nodes=1:cpu:ppn=2 -d .
126+
```
127+
- `-I` (upper case I) requests an interactive session.
128+
- `-l nodes=1:cpu:ppn=2` (lower case L) assigns one full GPU node.
129+
- `-d .` makes the current folder as the working directory for the task.
115130

116-
After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
131+
>**Note**: For more information about the node properties, execute the `pbsnodes` command.
117132
118-
### Running Samples on the Intel&reg; DevCloud (Optional)<a name="run-samples-on-devcloud"></a>
133+
6. Perform build steps you would on Linux.
134+
7. Run the sample.
119135

120-
<!---Include the next paragraph ONLY if the sample runs in batch mode-->
121-
### Run in Batch Mode
122-
This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
136+
> **Note**: To inspect job progress if you are using a script, use the qstat utility.
137+
> ```
138+
> watch -n 1 qstat -n -1
139+
> ```
140+
> The command displays the results every second. The job is complete when no new results display.
123141
124-
### Request a Compute Node
125-
In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
126-
This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
142+
8. Review the output.
143+
9. Disconnect from Intel® DevCloud.
144+
```
145+
exit
146+
```
127147
128-
<!---Mark each compatible Node in BOLD-->
129-
| Node | Command |
130-
| ----------------- | ------------------------------------------------------- |
131-
| GPU | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh |
132-
| __CPU__ | __qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh__ |
133-
| FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
134-
| FPGA Runtime | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
148+
## Example Output
135149
150+
>**Note**: The output displays similar numbers printed 4 times.
136151
137-
##### Expected Printed Output (with similar numbers, printed 4 times):
138152
```
139-
140-
141153
Here's our model:
142154

143-
144-
NumberOfBetas: 15
155+
NumberOfBetas: 15
145156

146157
NumberOfResponses: 1
147158

@@ -163,6 +174,11 @@ Here is one of our loaded model's features:
163174
6.67604639e-04 -9.01293646e-01 1.96091421e-01 -7.50083536e-03
164175
-3.11567377e-01 1.58333298e-02 -4.62941338e-01]]
165176
[CODE_SAMPLE_COMPLETED_SUCCESFULLY]
166-
167177
```
168178
179+
## License
180+
181+
Code samples are licensed under the MIT license. See
182+
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
183+
184+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).

0 commit comments

Comments
 (0)