You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: beginner_source/bettertransformer_tutorial.rst
+3-1
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ been updated to use the core library modules to benefit from fastpath accelerati
18
18
19
19
Better Transformer offers two types of acceleration:
20
20
21
-
* Native multihead attention implementation for CPU and GPU to improvee overall execution efficiency.
21
+
* Native multihead attention (MHA) implementation for CPU and GPU to improve overall execution efficiency.
22
22
* Exploiting sparsity in NLP inference. Because of variable input lengths, input
23
23
tokens may contain a large number of padding tokens for which processing may be
24
24
skipped, delivering significant speedups.
@@ -124,6 +124,7 @@ Finally, we set the benchmark iteration count:
124
124
2.1 Run and benchmark inference on CPU with and without BT fastpath (native MHA only)
125
125
126
126
We run the model on CPU, and collect profile information:
127
+
127
128
* The first run uses traditional ("slow path") execution.
128
129
* The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()` and disables gradient collection with `torch.no_grad()`.
129
130
@@ -167,6 +168,7 @@ We disable the BT sparsity:
167
168
168
169
169
170
We run the model on DEVICE, and collect profile information for native MHA execution on DEVICE:
171
+
170
172
* The first run uses traditional ("slow path") execution.
171
173
* The second run enables BT fastpath execution by putting the model in inference mode using `model.eval()`
172
174
and disables gradient collection with `torch.no_grad()`.
0 commit comments