Ai and analytics features and functionality intel python daal4py distributed linear regression (#1417)

jkinsky · jimmytwei · krzeszew · web-flow · commit d09449227c75 · 2023-03-13T09:23:41.000-07:00
* Fixes for 2023.1 AI Kit (#1409) * Intel Python Numpy Numba_dpes kNN sample (#1292) * *.py and *.ipynb files with implementation * README.md and sample.json files with documentation * License and thir party programs * Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample (#1293) * add IntelPytorch Quantization code samples (#1301) * add IntelPytorch Quantization code samples * fix the spelling error in the README file * use john's README with grammar fix and title change * Rename third-party-grograms.txt to third-party-programs.txt Co-authored-by: Jimmy Wei <jimmy.t.wei@intel.com> * AMX bfloat16 mixed precision learning TensorFlow Transformer sample (#1317) * [New Sample] Intel Extension for TensorFlow Getting Started (#1313) * first draft * Update README.md * remove redunant file * [New Sample] [oneDNN] Benchdnn tutorial (#1315) * New Sample: benchDNN tutorial * Update readme: new sample * Rename sample to benchdnn_tutorial * Name fix * Add files via upload (#1320) * [New Sample] oneCCL Bindings for PyTorch Getting Started (#1316) * Update README.md * [New Sample] oneCCL Bindings for PyTorch Getting Started * Update README.md * add torch-ccl version check * [New Sample] Intel Extension for PyTorch Getting Started (#1314) * add new ipex GSG notebook for dGPU * Update sample.json for expertise field * Update requirements.txt Update package versions to comply with Snyk tool * Updated title field in sample.json in TF Transformer AMX bfloat16 Mixed Precision sample to fit within character length range (#1327) * add arch checker class (#1332) * change gpu.patch to convert the code samples from cpu to gpu correctly (#1334) * Fixes for spelling in AMX bfloat16 transformer sample and printing error in python code in numpy vs numba sample (#1335) * 2023.1 ai kit itex get started example fix (#1338) * Fix the typo * Update ResNet50_Inference.ipynb * fix resnet inference demo link (#1339) * Fix printing issue in numpy vs numba AI sample (#1356) * Fix Invalid Kmeans parameters on oneAPI 2023 (#1345) * Update README to add new samples into the list (#1366) * PyTorch AMX BF16 Training sample: remove graphs and performance numbers (#1408) * Adding PyTorch Training Optimizations with AMX BF16 oneAPI sample * remove performance graphs, update README * remove graphs from README and folder * update top README in Features and Functionality --------- Co-authored-by: krzeszew <93649016+krzeszew@users.noreply.github.com> Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com> Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Orel Yehuda <orel.yehuda@intel.com> Co-authored-by: yuning <113460727+YuningQiu@users.noreply.github.com> Co-authored-by: Wang, Kai Lawrence <109344418+wangkl2@users.noreply.github.com> Co-authored-by: xiguiw <111278656+xiguiw@users.noreply.github.com> * Daal4py Distributed Linear Regression readme update Restructured to match the new readme template—more or less. Changed sample name to match the name in sample.json file. Updated the prerequisites to match the OS shown in the sample.json file. Restructured sections to increase clarity. Clarified and extended information on running sample in devcloud. Clarified and extended information about Jupyter Notebooks. Fixed formatting issues. Updated branding based on names in database. --------- Co-authored-by: Jimmy Wei <jimmy.t.wei@intel.com> Co-authored-by: krzeszew <93649016+krzeszew@users.noreply.github.com> Co-authored-by: alexsin368 <109180236+alexsin368@users.noreply.github.com> Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com> Co-authored-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: Orel Yehuda <orel.yehuda@intel.com> Co-authored-by: yuning <113460727+YuningQiu@users.noreply.github.com> Co-authored-by: Wang, Kai Lawrence <109344418+wangkl2@users.noreply.github.com> Co-authored-by: xiguiw <111278656+xiguiw@users.noreply.github.com>
diff --git a/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb b/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb
@@ -311,6 +311,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "5eea6ae7",
    "metadata": {},
    "source": [
     "The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset.   "
@@ -348,15 +349,16 @@
    "id": "b6ea2aeb",
    "metadata": {},
    "source": [
-    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  "
+    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0da073a6",
+   "id": "7bf01080",
    "metadata": {},
    "source": [
-    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."
+    "## Conclusion\n",
+    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
    ]
   },
   {
diff --git a/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md b/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md
@@ -148,11 +148,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
 
 ## Example Output
 
-If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample generates performance and analysis diagrams for comparison.
+If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline. 
 
-The following image shows approximate performance speed increases using AMX BF16 with auto-mixed precision during training. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  
-
-![comparison images](assets/amx_relative_speedup.png)
+The performance speedups using AMX BF16 are approximate on ResNet50. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). 
 
 ## License
 
diff --git a/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png b/AI-and-Analytics/Features-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png
diff --git a/AI-and-Analytics/Features-and-Functionality/IntelPython_daal4py_DistributedLinearRegression/README.md b/AI-and-Analytics/Features-and-Functionality/IntelPython_daal4py_DistributedLinearRegression/README.md
@@ -1,147 +1,158 @@
-# `Intel Python daal4py Distributed Linear Regression Sample`
+# `Intel® Python Daal4py Distributed Linear Regression` Sample
 
-This sample code shows how to train and predict with a distributed linear regression model using the python API package daal4py powered by the oneAPI Data Analytics Library. It assumes you have a working version of the Intel® MPI Library installed, and it demonstrates how to use software products that is powered by the [oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and found in the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
+This sample demonstrates how to train and predict with a distributed linear regression model using the Python API package Daal4py powered by the Intel® oneAPI Data Analytics Library (oneDAL).
 
-| Optimized for                     | Description
-| :---                              | :---
-| OS                                | 64-bit Linux: Ubuntu 18.04 or higher, 64-bit Windows 10, macOS 10.14 or higher
-| Hardware                          | Intel Atom® Processors; Intel® Core™ Processor Family; Intel® Xeon® Processor Family; Intel® Xeon® Scalable  processor family
-| Software                          | Intel® AI Analytics Toolkit
-| What you will learn               | distributed daal4py Linear Regression programming model for Intel CPU
-| Time to complete                  | 5 minutes
+| Area                 | Description
+|:---                  |:---
+| What you will learn  | How to use distributed Daal4py Linear Regression programming model for Intel CPUs
+| Time to complete     | 5 minutes
+| Category             | Concepts and Functionality
 
 ## Purpose
 
-daal4py is a simplified API to Intel® oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel® oneDAL for direct usage or integration into one's own framework.
+Daal4py is a simplified API to oneDAL that allows for fast usage of the framework suited for Data Scientists or Machine Learning developers. The sample is intended to provide abstraction to Intel® oneDAL for direct usage or integration your development framework.
 
-In this sample, you will run a distributed Linear Regression model with oneDAL daal4py library memory objects. You will also learn how to train a model and save the information to a file.
+In this sample, you will run a distributed Linear Regression model with oneDAL Daal4py library memory objects. You will also learn how to train a model and save the information to a file.
+
+## Prerequisites
+
+| Optimized for     | Description
+|:---               |:---
+| OS                | Ubuntu* 18.04 or higher
+| Hardware          | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
+| Software          | Intel® AI Analytics Toolkit (AI Kit)
 
 ## Key Implementation Details
-This distributed linear regression sample code is implemented for the CPU using the Python language. The example assumes you have daal4py and scikit-learn installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).
 
-## License
-Code samples are licensed under the MIT license. See
-[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
+The sample demonstrates how to use software products that are powered by [Intel® oneAPI Data Analytics Library (oneDAL)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html) and the [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/en-us/oneapi/ai-kit).
+
+The sample assumes you have a working version of the Intel® MPI Library, Daal4py, and scikit-learn installed inside a conda environment (similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the AI Kit.)
 
-Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
+## Set Environment Variables
 
-## Running Samples on the Intel&reg; DevCloud
-If you are running this sample on the DevCloud, see [Running Samples on the Intel&reg; DevCloud](#run-samples-on-devcloud)
+When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
 
-## Building daal4py for CPU
+## Build the `Intel® Python Daal4py Distributed Linear Regression` Sample
 
-oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
+You can refer to the *[Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html)* for post-installation steps and scripts.
 
-You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
+The Intel® oneAPI Data Analytics Library is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.
 
 > **Note**: If you have not already done so, set up your CLI
-> environment by sourcing  the `setvars` script located in
-> the root of your oneAPI installation.
->
-> Linux Sudo: . /opt/intel/oneapi/setvars.sh
+> environment by sourcing  the `setvars` script in the root of your oneAPI installation.
 >
-> Linux User: . ~/intel/oneapi/setvars.sh
+> Linux*:
+> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
+> - For private installations: ` . ~/intel/oneapi/setvars.sh`
+> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
 >
-> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
->
->For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
+> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.
+
+### On Linux*
 
-### Activate conda environment With Root Access
+#### Activate Conda with Root Access
 
-Intel Python environment will be active by default. However, if you activated another environment, you can return with the following command:
+By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it. However, if you activated another environment, you can return with the following command.
 
-#### On a Linux* System
 ```
 source activate base
 ```
 
-### Activate conda environment Without Root Access (Optional)
+#### Activate Conda without Root Access (Optional)
 
-By default, the Intel® AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
+You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
 
-#### On a Linux* System
 ```
 conda create --name usr_intelpython --clone base
-```
-
-Then activate your conda environment with the following command:
-
-```
 source activate usr_intelpython
 ```
 
-### Install Jupyter Notebook
-```
-conda install jupyter nb_conda_kernels
-```
+#### Jupyter Notebook (Optional)
 
-#### View in Jupyter Notebook
+>**Note**: This sample cannot be launched from the Jupyter Notebook version; however, you can still view inside the notebook to follow the included write-up and description.
 
-_Note: This distributed execution cannot be launched from the jupyter notebook version, but you can still view inside the notebook to follow the included write-up and description._
+1. If you have not already done so, install Jupyter Notebook.
+   ```
+   conda install jupyter nb_conda_kernels
+   ```
+2. Launch Jupyter Notebook.
+   ```
+   jupyter notebook
+   ```
+3. Locate and select the Notebook.
+   ```
+   IntelPython_daal4py_Distributed_LinearRegression.ipynb
+   ```
 
-Launch Jupyter Notebook in the directory housing the code example
+## Run the `Intel® Python Daal4py Distributed Linear Regression` Sample
 
-```
-jupyter notebook
-```
+### On Linux
 
-## Running the Sample<a name="running-the-sample"></a>
+When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell.
 
-### Running the Sample as a Python File
+1. Run the script with a command similar to the following command. (The number **4** is an example and indicates that the script will run on **4 processes**.)
 
-When using daal4py for distributed memory systems, the command needed to execute the program should be executed in a bash shell. To execute this example, run the following command, where the number **4** is chosen as an example and means that it will run on **4 processes**:
+   ```
+   mpirun -n 4 python ./IntelPython_daal4py_Distributed_LinearRegression.py
+   ```
 
-Run the Program
+   When it completes, the script output will be in the included **/models** and **/results** directories.
 
-`mpirun -n 4 python ./IntelPython_daal4py_Distributed_LinearRegression.py`
+   >**Note**: This code samples focus on using Daal4py for distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the primary node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well.
 
-The output of the script will be saved in the included models and result directories.
+#### Troubleshooting
 
-_Note: This code samples focus on using daal4py to do distributed ML computations on chunks of data. The `mpirun` command above will only run on a single local node. To launch on a cluster, you will need to create a host file on the master node, among other steps. The **TensorFlow_Multinode_Training_with_Horovod** code sample explains this process well._
+If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility.
 
-### Using Visual Studio Code*  (VS Code)
+### Build and Run the Sample on Intel® DevCloud (Optional)
 
-You can use VS Code extensions to set your environment, create launch configurations,
-and browse and download samples.
+>**Note**: For more information on using Intel® DevCloud, see the Intel® oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started/) page.
 
-The basic steps to build and run a sample using VS Code include:
- - Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
- - Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
- - Open a Terminal in VS Code (**Terminal>New Terminal**).
- - Run the sample in the VS Code terminal using the instructions below.
+1. Open a terminal on a Linux* system.
+2. Log in to the Intel® DevCloud.
+   ```
+   ssh devcloud
+   ```
+3. If the sample is not already available, download the samples from GitHub.
+   ```
+   git clone https://github.com/oneapi-src/oneAPI-samples.git
+   ```
+4. Change to the sample directory.
+5. Configure the sample for the appropriate node.
 
-To learn more about the extensions and how to configure the oneAPI environment, see
-[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
+   The following example is for a CPU node. (This is a single line script.)
+	```
+	qsub  -I  -l nodes=1:cpu:ppn=2 -d .
+	```
+   - `-I` (upper case I) requests an interactive session.
+   - `-l nodes=1:cpu:ppn=2` (lower case L) assigns one full GPU node.
+   - `-d .` makes the current folder as the working directory for the task.
 
-After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
+     >**Note**: For more information about the node properties, execute the `pbsnodes` command.
 
-### Running Samples on the Intel&reg; DevCloud (Optional)<a name="run-samples-on-devcloud"></a>
+6. Perform build steps you would on Linux.
+7. Run the sample.
 
-<!---Include the next paragraph ONLY if the sample runs in batch mode-->
-### Run in Batch Mode
-This sample runs in batch mode, so you must have a script for batch processing. Once you have a script set up, refer to [Running the Sample](#running-the-sample).
+   > **Note**: To inspect job progress if you are using a script, use the qstat utility.
+   > ```
+   > watch -n 1 qstat -n -1
+   > ```
+   > The command displays the results every second. The job is complete when no new results display.
 
-### Request a Compute Node
-In order to run on the DevCloud, you need to request a compute node using node properties such as: `gpu`, `xeon`, `fpga_compile`, `fpga_runtime` and others. For more information about the node properties, execute the `pbsnodes` command.
- This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the [Hello World instructions](https://devcloud.intel.com/oneapi/get_started/aiAnalyticsToolkitSamples/), change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:
+8. Review the output.
+9. Disconnect from Intel® DevCloud.
+	```
+	exit
+	```
 
-<!---Mark each compatible Node in BOLD-->
-| Node              | Command                                                 |
-| ----------------- | ------------------------------------------------------- |
-| GPU               | qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh           |
-| __CPU__           | __qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh__      |
-| FPGA Compile Time | qsub -l nodes=1:fpga\_compile:ppn=2 -d . hello-world.sh |
-| FPGA Runtime      | qsub -l nodes=1:fpga\_runtime:ppn=2 -d . hello-world.sh |
+## Example Output
 
+>**Note**: The output displays similar numbers printed 4 times.
 
-##### Expected Printed Output (with similar numbers, printed 4 times):
 ```
-
-
 Here's our model:
 
-
- NumberOfBetas: 15
+NumberOfBetas: 15
 
 NumberOfResponses: 1
 
@@ -163,6 +174,11 @@ Here is one of our loaded model's features:
    6.67604639e-04 -9.01293646e-01  1.96091421e-01 -7.50083536e-03
   -3.11567377e-01  1.58333298e-02 -4.62941338e-01]]
 [CODE_SAMPLE_COMPLETED_SUCCESFULLY]
-
 ```
 
+## License
+
+Code samples are licensed under the MIT license. See
+[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
+
+Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).

Original file line number	Diff line number	Diff line change
`@@ -311,6 +311,7 @@`
`311`	`311`	`},`
`312`	`312`	`{`
`313`	`313`	`"cell_type": "markdown",`
	`314`	`+ "id": "5eea6ae7",`
`314`	`315`	`"metadata": {},`
`315`	`316`	`"source": [`
`316`	`317`	`"The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset. "`
`@@ -348,15 +349,16 @@`
`348`	`349`	`"id": "b6ea2aeb",`
`349`	`350`	`"metadata": {},`
`350`	`351`	`"source": [`
`351`		`- "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. "`
	`352`	`+ "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."`
`352`	`353`	`]`
`353`	`354`	`},`
`354`	`355`	`{`
`355`	`356`	`"cell_type": "markdown",`
`356`		`- "id": "0da073a6",`
	`357`	`+ "id": "7bf01080",`
`357`	`358`	`"metadata": {},`
`358`	`359`	`"source": [`
`359`		`- "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."`
	`360`	`+ "## Conclusion\n",`
	`361`	+ "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
`360`	`362`	`]`
`361`	`363`	`},`
`362`	`364`	`{`