oneapi-src · jimmytwei · Mar 24, 2023 · Mar 7, 2023 · Mar 20, 2023
diff --git a/...ures-and-Functionality/IntelPyTorch_Extensions_Inference_Optimization/README.md b/...ures-and-Functionality/IntelPyTorch_Extensions_Inference_Optimization/README.md
@@ -1,66 +1,72 @@
-# Tutorial: Optimize PyTorch Models using Intel® Extension for PyTorch* (IPEX)
-This notebook guides you through the process of extending your PyTorch code with Intel® Extension for PyTorch* (IPEX) with optimizations to achieve performance boosts on Intel® hardware.
+# `Optimize PyTorch* Models using Intel® Extension for PyTorch* (IPEX)` Sample
+
+This notebook guides you through the process of extending your PyTorch* code with Intel® Extension for PyTorch* (IPEX) with optimizations to achieve performance boosts on Intel® hardware.
 
 | Area                  | Description
 |:---                   |:---
 | What you will learn   | Applying IPEX Optimizations to a PyTorch workload in a step-by-step manner to gain performance boost
 | Time to complete      | 30 minutes
+| Category              | Code Optimization
 
 ## Purpose
 
-This sample notebook shows how to get started with Intel® Extension for PyTorch* (IPEX) for sample Computer Vision and NLP workloads. 
+This sample notebook shows how to get started with Intel® Extension for PyTorch (IPEX) for sample Computer Vision and NLP workloads.
 
 The sample starts by loading two models from the PyTorch hub: **Faster-RCNN** (Faster R-CNN) and **distilbert** (DistilBERT). After loading the models, the sample applies sequential optimizations from IPEX and examines performance gains for each incremental change.
 
 You can make code changes quickly on top of existing PyTorch code to obtain the performance speedups for model inference.
 
 ## Prerequisites
 
-| Optimized for           | Description
-|:---                     |:---
-| OS                      | Ubuntu* 18.04 or newer
-| Hardware                | Intel® Xeon® Scalable processor family
-| Software                | Intel® AI Analytics Toolkit (AI Kit)
+| Optimized for          | Description
+|:---                    |:---
+| OS                     | Ubuntu* 18.04 or newer
+| Hardware               | Intel® Xeon® Scalable processor family
+| Software               | Intel® AI Analytics Toolkit (AI Kit)
 
 ### For Local Development Environments
 
 You will need to download and install the following toolkits, tools, and components to use the sample.
 
 - **Intel® AI Analytics Toolkit (AI Kit)**
 
-  You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts.
+  You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts. This sample assumes you have **Matplotlib** installed.
+
 
 - **Jupyter Notebook**
 
-  Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
+  Install using PIP: `pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
 
 - **Transformers - Hugging Face**
 
-  Install using PIP: `$pip install transformers`
+  Install using PIP: `pip install transformers`
 
 ### For Intel® DevCloud
 
-Most of necessary tools and components are already installed in the environment. You do not need to install additional components. See [Intel® DevCloud for oneAPI](https://devcloud.intel.com/oneapi/get_started/) for information.
-You would need to install the Hugging Face Transformers library using pip as shown above.
+Most of necessary tools and components are already installed in the environment. You do not need to install additional components. See [Intel® DevCloud for oneAPI](https://devcloud.intel.com/oneapi/get_started/) for information. You would need to install the Hugging Face Transformers library using pip as shown above.
 
 ## Key Implementation Details
 
 This sample tutorial contains one Jupyter Notebook and one Python script.
 
 ### Jupyter Notebook
 
-|Notebook                                    |Description
-|:---                                        |:---
+| Notebook                                 | Description
+|:---                                      |:---
 |`optimize_pytorch_models_with_ipex.ipynb` |Gain performance boost during inference using IPEX.
 
 ### Python Script
 
-|Script                                    |Description
-|:---                                      |:---
-|`resnet50.py`                          |The script optimizes a Faster R-CNN model to be used with IPEX Launch Script.
+| Script              | Description
+|:---                 |:---
+|`resnet50.py`        |The script optimizes a Faster R-CNN model to be used with IPEX Launch Script.
 
 
-## Run the Sample on Linux*
+## Set Environment Variables
+
+When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
+
+## Run the `Optimize PyTorch* Models using Intel® Extension for PyTorch* (IPEX)` Sample
 
 > **Note**: If you have not already done so, set up your CLI
 > environment by sourcing  the `setvars` script in the root of your oneAPI installation.
@@ -84,7 +90,16 @@ This sample tutorial contains one Jupyter Notebook and one Python script.
 
    By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it.
 
-   You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
+#### Activate Conda without Root Access (Optional)
+
+You can choose to activate Conda environment without root access.
+
+1. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using commands similar to the following.
+
+   ```
+   conda create --name user_pytorch --clone pytorch
+   conda activate user_pytorch
+   ```
 
 #### Run the NoteBook
 
@@ -97,14 +112,14 @@ This sample tutorial contains one Jupyter Notebook and one Python script.
    ```
    optimize_pytorch_models_with_ipex.ipynb
    ```
-4. Change the kernel to **pytorch**.
+4. Change the kernel to **PyTorch (AI Kit)**.
 5. Run every cell in the Notebook in sequence.
 
 #### Troubleshooting
 
 If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.
 
-### Run the Sample on Intel® DevCloud
+### Run the Sample on Intel® DevCloud (Optional)
 
 1. If you do not already have an account, request an Intel® DevCloud account at [*Create an Intel® DevCloud Account*](https://intelsoftwaresites.secure.force.com/DevCloud/oneapi).
 2. On a Linux* system, open a terminal.
@@ -123,15 +138,13 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
 
 ## Example Output
 
-Users should be able to see some diagrams for performance comparison and analysis.
-An example of performance comparison for inference speedup obtained by enabling IPEX optimizations.
+Users should be able to see some diagrams for performance comparison and analysis. An example of performance comparison for inference speedup obtained by enabling IPEX optimizations.
 
 ![Performance Numbers](images/performance_numbers.png)
 
-
 ## License
 
 Code samples are licensed under the MIT license. See
 [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
 
-Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
+Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
diff --git a/...lPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb b/...lPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb
@@ -311,6 +311,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "5eea6ae7",
    "metadata": {},
    "source": [
     "The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset.   "
@@ -348,15 +349,16 @@
    "id": "b6ea2aeb",
    "metadata": {},
    "source": [
-    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  "
+    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0da073a6",
+   "id": "7bf01080",
    "metadata": {},
    "source": [
-    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."
+    "## Conclusion\n",
+    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
    ]
   },
   {

diff --git a/...eatures-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md b/...eatures-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md
@@ -148,11 +148,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
 
 ## Example Output
 
-If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample generates performance and analysis diagrams for comparison.
+If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline. 
 
-The following image shows approximate performance speed increases using AMX BF16 with auto-mixed precision during training. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  
-
-![comparison images](assets/amx_relative_speedup.png)
+The performance speedups using AMX BF16 are approximate on ResNet50. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). 
 
 ## License
 

diff --git a/...ity/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png b/...ity/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png