Skip to content

Commit 48264e5

Browse files
authored
AMX bfloat16 mixed precision learning TensorFlow Transformer sample (#1317)
1 parent d5aea45 commit 48264e5

File tree

11 files changed

+832
-0
lines changed

11 files changed

+832
-0
lines changed

AI-and-Analytics/Features-and-Functionality/IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision/IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision.ipynb

+536
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Copyright Intel Corporation
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4+
5+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6+
7+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# `TensorFlow (TF) Transformer with Intel® Advanced Matrix Extensions (Intel® AMX) bfoat16 Mixed Precision Learning`
2+
3+
This sample code demonstrates optimizing a TensorFlow model with Intel® Advanced Matrix Extensions (Intel® AMX) using bfloat16 (Brain Floating Point) on 4th Gen Intel® Xeon® Scalable Processors (Sapphire Rapids).
4+
5+
| Area | Description
6+
|:--- |:--
7+
What you will learn | How to use AMX bfloat16 mixed precision learning on a TensorFlow model
8+
| Time to complete | 15 minutes
9+
10+
> **Note**: The sample is based on the [*Text classification with Transformer*](https://keras.io/examples/nlp/text_classification_with_transformer/) Keras sample.
11+
12+
13+
## Purpose
14+
15+
In this sample, you will run a transformer classification model with bfloat16 mixed precision learning on Intel® AMX ISA and compare the performance against AVX512. You should notice that using Intel® AMX results in performance increases when compared to AVX512 while retaining the expected precision.
16+
17+
## Prerequisites
18+
19+
This sample code work on **Sapphire Rapids** only.
20+
21+
| Optimized for | Description
22+
|:--- |:---
23+
| OS | Ubuntu* 20.04
24+
| Hardware | Sapphire Rapids
25+
| Software | Intel® AI Analytics Toolkit (AI Kit)
26+
27+
The sample assumes Intel® Optimization for TensorFlow is installed. (See the [Intel® Optimization for TensorFlow* Installation Guide](https://www.intel.com/content/www/us/en/developer/articles/guide/optimization-for-TensorFlow-installation-guide.html) for more information.)
28+
29+
### For Local Development Environments
30+
31+
You will need to download and install the following toolkits, tools, and components to use the sample.
32+
33+
- **Intel® AI Analytics Toolkit (AI Kit)**
34+
35+
You can get the AI Kit from [Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#analytics-kit). <br> See [*Get Started with the Intel® AI Analytics Toolkit for Linux**](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux) for AI Kit installation information and post-installation steps and scripts.
36+
37+
- **Jupyter Notebook**
38+
39+
Install using PIP: `$pip install notebook`. <br> Alternatively, see [*Installing Jupyter*](https://jupyter.org/install) for detailed installation instructions.
40+
41+
42+
- **Intel® oneAPI Data Analytics Library**
43+
44+
You might need some parts of the [Intel® oneAPI Data Analytics Library](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onedal.html).
45+
46+
47+
### For Intel® DevCloud
48+
49+
The necessary tools and components are already installed in the environment. You do not need to install additional components. See [Intel® DevCloud for oneAPI](https://devcloud.intel.com/oneapi/get_started/) for information.
50+
51+
52+
## Key Implementation Details
53+
54+
The sample code is written in Python and targets Sapphire Rapids only.
55+
56+
57+
## Run the Sample
58+
59+
### On Linux*
60+
61+
> **Note**: If you have not already done so, set up your CLI
62+
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
63+
>
64+
> Linux*:
65+
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
66+
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
67+
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
68+
>
69+
> For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html).
70+
71+
#### Activate Conda
72+
73+
1. Activate the Conda environment.
74+
75+
```
76+
conda activate tensorflow
77+
```
78+
79+
By default, the AI Kit is installed in the `/opt/intel/oneapi` folder and requires root privileges to manage it.
80+
81+
You can choose to activate Conda environment without root access. To bypass root access to manage your Conda environment, clone and activate your desired Conda environment using the following commands similar to the following.
82+
83+
```
84+
conda create --name usr_tensorflow --clone tensorflow
85+
conda activate usr_tensorflow
86+
```
87+
88+
#### Run the NoteBook
89+
90+
1. Launch Jupyter Notebook.
91+
```
92+
jupyter notebook --ip=0.0.0.0
93+
```
94+
2. Follow the instructions to open the URL with the token in your browser.
95+
3. Locate and select the Notebook.
96+
```
97+
IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision.ipynb
98+
```
99+
4. Run every cell in the Notebook in sequence.
100+
101+
102+
#### Troubleshooting
103+
104+
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.
105+
106+
107+
### Run the Sample on Intel® DevCloud
108+
109+
1. If you do not already have an account, request an Intel® DevCloud account at [*Create an Intel® DevCloud Account*](https://intelsoftwaresites.secure.force.com/DevCloud/oneapi).
110+
2. On a Linux* system, open a terminal.
111+
3. SSH into Intel® DevCloud.
112+
```
113+
ssh DevCloud
114+
```
115+
> **Note**: You can find information about configuring your Linux system and connecting to Intel DevCloud at Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started).
116+
117+
4. Locate and select the Notebook.
118+
```
119+
IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision.ipynb
120+
```
121+
5. Run every cell in the Notebook in sequence.
122+
123+
124+
## Example Output
125+
126+
You should see diagrams demonstrating performance analysis formatted, as pie charts, for JIT Kernel Type Time breakdown for both AVX512 and AMX.
127+
128+
The following image shows a typical example of JIT Kernel Time breakdown file analysis diagrams.
129+
130+
![jit pie chart](images/jit_breakdown_pie.png)
131+
132+
## Further Reading
133+
134+
Explore [Get Started with the Intel® AI Analytics Toolkit for Linux*](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-ai-linux/top.html) to find out how you can achieve performance gains for popular deep-learning and machine-learning frameworks through Intel optimizations.
135+
136+
## License
137+
138+
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt)
139+
for details.
140+
141+
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
mkdir logs
4+
5+
wget https://raw.githubusercontent.com/IntelAI/models/master/benchmarks/common/platform_util.py
6+
7+
echo "########## Executing the run"
8+
9+
source /opt/intel/oneapi/setvars.sh
10+
source activate tensorflow
11+
12+
ONEDNN_VERBOSE_TIMESTAMP=1 ONEDNN_VERBOSE=1 python ./text_classification_with_transformer.py > ./logs/dnn_logs.txt
13+
14+
echo "########## Done with the run"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
3+
echo "########## Executing the run"
4+
5+
source /opt/intel/oneapi/setvars.sh
6+
source activate tensorflow
7+
8+
ONEDNN_VERBOSE_TIMESTAMP=1 ONEDNN_VERBOSE=1 python ./text_classification_with_transformer.py > ./logs/dnn_logs_mixed.txt
9+
10+
echo "########## Done with the run"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
--- text_classification_with_transformer.py 2022-09-20 02:24:42.814605146 -0700
2+
+++ text_classification_with_transformer2.py 2022-09-20 02:24:48.489188611 -0700
3+
@@ -27,6 +27,16 @@
4+
5+
6+
"""
7+
+## Bfloat16 mixed precision learning
8+
+"""
9+
+
10+
+from tensorflow.keras import mixed_precision
11+
+
12+
+policy = mixed_precision.Policy('mixed_bfloat16')
13+
+mixed_precision.set_global_policy(policy)
14+
+
15+
+
16+
+"""
17+
## Implement a Transformer block as a layer
18+
"""
19+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
--- text_classification_with_transformer.py 2022-10-17 04:04:37.455448493 -0700
2+
+++ text_classification_with_transformer2.py 2022-10-17 04:07:15.196716415 -0700
3+
@@ -9,10 +9,38 @@
4+
## Setup
5+
"""
6+
7+
+from time import time
8+
+import os
9+
+
10+
import tensorflow as tf
11+
from tensorflow import keras
12+
from tensorflow.keras import layers
13+
14+
+from platform_util import PlatformUtil
15+
+cpu_info = PlatformUtil("")
16+
+
17+
+numa_nodes = cpu_info.numa_nodes
18+
+print("CPU count per socket:" , cpu_info.cores_per_socket ," \nSocket count:", cpu_info.sockets, " \nNuma nodes:",numa_nodes)
19+
+
20+
+if numa_nodes > 0:
21+
+ socket_number = 1
22+
+ cpu_count = cpu_info.cores_per_socket
23+
+ inter_thread = 1
24+
+else:
25+
+ # on non-numa machine, we should use all the cores and don't use numactl
26+
+ socket_number = -1
27+
+ cpu_count = cpu_info.cores_per_socket * cpu_info.sockets
28+
+ inter_thread = cpu_info.sockets
29+
+
30+
+# Intel OpenMP threads and other fine-tuning parameters
31+
+os.environ['OMP_NUM_THREADS'] = "cpu_count "
32+
+os.environ['KMP_BLOCKTIME'] = "inter_thread"
33+
+os.environ['KMP_AFFINITY'] = "granularity=fine,verbose,compact,1,0"
34+
+
35+
+# # Eigen threads
36+
+tf.config.threading.set_intra_op_parallelism_threads(cpu_count)
37+
+tf.config.threading.set_inter_op_parallelism_threads(inter_thread)
38+
+
39+
40+
"""
41+
## Implement a Transformer block as a layer
42+
@@ -110,6 +138,11 @@
43+
model.compile(
44+
optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
45+
)
46+
+
47+
+start = time()
48+
history = model.fit(
49+
x_train, y_train, batch_size=32, epochs=2, validation_data=(x_val, y_val)
50+
)
51+
+end = time()
52+
+
53+
+print("time: ", end-start)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
echo "########## Executing the run"
4+
5+
source activate tensorflow
6+
7+
# enable verbose log
8+
export DNNL_VERBOSE=2
9+
# enable JIT Dump
10+
export DNNL_JIT_DUMP=1
11+
12+
DNNL_MAX_CPU_ISA=AVX512_CORE_BF16 python ./text_classification_with_transformer.py cpu >> ./logs/log_cpu_bf16_avx512_bf16.csv 2>&1
13+
14+
echo "########## Done with the run"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
echo "########## Executing the run"
4+
5+
source activate tensorflow
6+
7+
# enable verbose log
8+
export DNNL_VERBOSE=2
9+
# enable JIT Dump
10+
export DNNL_JIT_DUMP=1
11+
12+
DNNL_MAX_CPU_ISA=AVX512_CORE_AMX python ./text_classification_with_transformer.py cpu >> ./logs/log_cpu_bf16_avx512_amx.csv 2>&1
13+
14+
echo "########## Done with the run"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
"guid": "60A68888-6099-414E-999B-EDC7310A01EA",
3+
"name": "TensorFlow (TF) Transformer with Intel® Advanced Matrix Extensions (Intel® AMX) bfoat16 Mixed Precision Learning",
4+
"categories": ["Toolkit/oneAPI AI And Analytics/AI Getting Started Samples"],
5+
"description": "This sample code demonstrates optimizing a TensorFlow model with Intel® Advanced Matrix Extensions (Intel® AMX) using bfloat16 (Brain Floating Point) on Sapphire Rapids",
6+
"builder": ["cli"],
7+
"languages": [{"python":{}}],
8+
"os":["linux"],
9+
"targetDevice": ["CPU"],
10+
"ciTests": {
11+
"linux": [
12+
{
13+
"env": [],
14+
"id": "Transformer_AMX_bfloat16_Mixed_Precision_Learning",
15+
"steps": [
16+
"conda activate tensorflow",
17+
"conda install -y jupyter",
18+
"jupyter nbconvert --execute IntelTensorFlow_Transformer_AMX_bfloat16_MixedPrecision.ipynb"
19+
]
20+
}
21+
]
22+
},
23+
"expertise": "Getting Started"
24+
}

0 commit comments

Comments
 (0)