oneapi-src · jimmytwei · Mar 24, 2023 · Mar 7, 2023 · Mar 24, 2023
diff --git a/...lPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb b/...lPyTorch_TrainingOptimizations_AMX_BF16/IntelPyTorch_TrainingOptimizations_AMX_BF16.ipynb
@@ -311,6 +311,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "5eea6ae7",
    "metadata": {},
    "source": [
     "The training times for the 3 cases are printed out and shown in the figure above. Using BF16 should show significant reduction in training time. However, there is little to no change using AVX512 with BF16 and AMX with BF16 because the amount of computations required for one batch is too small with this dataset.   "
@@ -348,15 +349,16 @@
    "id": "b6ea2aeb",
    "metadata": {},
    "source": [
-    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512. The expected behavior is that AMX with BF16 should have about a 1.5X improvement over FP32 and about the same performance as BF16 with AVX512. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  "
+    "This figure shows the relative performance speedup of AMX compared to FP32 and BF16 with AVX512."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0da073a6",
+   "id": "7bf01080",
    "metadata": {},
    "source": [
-    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training the ResNet50 model. There will be additional significant performance improvements if AMX INT8 is used in inference, which is covered in a related oneAPI sample."
+    "## Conclusion\n",
+    "This code sample shows how to enable and disable AMX during runtime, as well as the performance improvements using AMX BF16 for training on the ResNet50 model. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). "
    ]
   },
   {

diff --git a/...eatures-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md b/...eatures-and-Functionality/IntelPyTorch_TrainingOptimizations_AMX_BF16/README.md
@@ -148,11 +148,9 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
 
 ## Example Output
 
-If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample generates performance and analysis diagrams for comparison.
+If successful, the sample displays `[CODE_SAMPLE_COMPLETED_SUCCESSFULLY]`. Additionally, the sample will print out the runtimes and charts of relative performance with the FP32 model without any optimizations as the baseline. 
 
-The following image shows approximate performance speed increases using AMX BF16 with auto-mixed precision during training. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset.  
-
-![comparison images](assets/amx_relative_speedup.png)
+The performance speedups using AMX BF16 are approximate on ResNet50. Performance will vary based on your hardware and software versions. To see more performance improvement between AVX-512 BF16 and AMX BF16, increase the amount of required computations in one batch. This can be done by increasing the batch size with CIFAR10 or using another dataset. For even more speedup, consider using the Intel® Extension for PyTorch* [Launch Script](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/launch_script.html). 
 
 ## License
 

diff --git a/...ity/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png b/...ity/IntelPyTorch_TrainingOptimizations_AMX_BF16/assets/amx_relative_speedup.png
diff --git a/...Functionality/IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision/README.md b/...Functionality/IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision/README.md
@@ -1,30 +1,30 @@
-# `TensorFlow (TF) Transformer with Intel® Advanced Matrix Extensions (Intel® AMX) bfloat16 Mixed Precision Learning` 
+# `TensorFlow* Transformer with Advanced Matrix Extensions bfloat16 Mixed Precision Learning` Sample 
 
-This sample code demonstrates optimizing a TensorFlow model with Intel® Advanced Matrix Extensions (Intel® AMX) using bfloat16 (Brain Floating Point) on  4th Gen Intel® Xeon® Scalable Processors (Sapphire Rapids).
+The `TensorFlow* Transformer with Advanced Matrix Extensions bfloat16 Mixed Precision Learning` sample code demonstrates optimizing a TensorFlow* model with Intel® Advanced Matrix Extensions (Intel® AMX) using bfloat16 (Brain Floating Point) on 4th Gen Intel® Xeon® processors (formerly Sapphire Rapids).
 
 | Area                  | Description
 |:---                   |:--
- What you will learn    | How to use AMX bfloat16 mixed precision learning on a TensorFlow model
+ What you will learn    | How to use Intel® AMX bfloat16 mixed precision learning on a TensorFlow* model
 | Time to complete      | 15 minutes
+| Category              | Getting Started
 
 > **Note**: The sample is based on the [*Text classification with Transformer*](https://keras.io/examples/nlp/text_classification_with_transformer/) Keras sample.
 
-
 ## Purpose
 
 In this sample, you will run a transformer classification model with bfloat16 mixed precision learning on Intel® AMX ISA and compare the performance against AVX512. You should notice that using Intel® AMX results in performance increases when compared to AVX512 while retaining the expected precision.
 
 ## Prerequisites
 
-This sample code work on **Sapphire Rapids** only.
+>**Note**: The code in the sample works on 4th Gen Intel® Xeon® processors (formerly Sapphire Rapids) only.
 
 | Optimized for             | Description
 |:---                       |:---
 | OS                        | Ubuntu* 20.04
-| Hardware                  | Sapphire Rapids
+| Hardware                  | 4th Gen Intel® Xeon® processors
 | Software                  | Intel® AI Analytics Toolkit (AI Kit)
 
-The sample assumes Intel® Optimization for TensorFlow is installed. (See the [Intel® Optimization for TensorFlow* Installation Guide](https://www.intel.com/content/www/us/en/developer/articles/guide/optimization-for-TensorFlow-installation-guide.html) for more information.)
+The sample assumes Intel® Optimization for TensorFlow* is installed. (See the [Intel® Optimization for TensorFlow* Installation Guide](https://www.intel.com/content/www/us/en/developer/articles/guide/optimization-for-TensorFlow-installation-guide.html) for more information.)
 
 ### For Local Development Environments
 
@@ -39,7 +39,7 @@ You will need to download and install the following toolkits, tools, and compone
   Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
 
 
-- **Intel® oneAPI Data Analytics Library**
+- **Intel® oneAPI Data Analytics Library (oneDAL)**
 
   You might need some parts of the [Intel® oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html).
 
@@ -51,8 +51,11 @@ The necessary tools and components are already installed in the environment. You
 
 ## Key Implementation Details
 
-The sample code is written in Python and targets Sapphire Rapids only.
+The sample code is written in Python and targets 4th Gen Intel® Xeon® processors (formerly Sapphire Rapids) only.
+
+## Set Environment Variables
 
+When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.
 
 ## Run the Sample
 
@@ -71,11 +74,9 @@ The sample code is written in Python and targets Sapphire Rapids only.
 #### Activate Conda
 
 1. Activate the Conda environment.
-
     ```
     conda activate tensorflow
     ```
-
    By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it.
 
    You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
@@ -85,7 +86,7 @@ The sample code is written in Python and targets Sapphire Rapids only.
    conda activate usr_tensorflow
    ```
 
-#### Run the NoteBook
+#### Run Jupyter NoteBook
 
 1. Launch Jupyter Notebook.
    ```
@@ -98,11 +99,9 @@ The sample code is written in Python and targets Sapphire Rapids only.
    ```
 4. Run every cell in the Notebook in sequence.
 
-
 #### Troubleshooting
 
-If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.
-
+If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility.
 
 ### Run the Sample on Intel® DevCloud
 
@@ -112,7 +111,7 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic
    ```
    ssh DevCloud
    ```
-   > **Note**: You can find information about configuring your Linux system and connecting to Intel DevCloud at Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started).
+   > **Note**: You can find information about configuring your Linux system and connecting to Intel DevCloud at Intel® DevCloud for oneAPI *[Get Started](https://devcloud.intel.com/oneapi/get_started)*.
 
 4. Locate and select the Notebook.
    ```
@@ -131,7 +130,7 @@ The following image shows a typical example of JIT Kernel Time breakdown file an
 
 ## Further Reading
 
-Explore [Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html) to find out how you can achieve performance gains for popular deep-learning and machine-learning frameworks through Intel optimizations.
+Explore *[Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html)* to find out how you can achieve performance gains for popular deep-learning and machine-learning frameworks through Intel optimizations.
 
 ## License