@@ -401,63 +401,11 @@ def trace_handler(p):
401
401
######################################################################
402
402
# We will get the following profile table for the eager model:
403
403
#
404
- # .. code-block:: shell
405
- #
406
- # ------------------------- ------------ ------------ ------------ ------------ ------------ ------------
407
- # Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
408
- # ------------------------- ------------ ------------ ------------ ------------ ------------ ------------
409
- # aten::addmm 33.36% 270.520ms 45.73% 370.814ms 1.024ms 362
410
- # aten::add 19.89% 161.276ms 19.89% 161.276ms 444.287us 363
411
- # aten::copy_ 14.97% 121.416ms 14.97% 121.416ms 248.803us 488
412
- # aten::mul 9.02% 73.151ms 9.02% 73.154ms 377.082us 194
413
- # aten::clamp_min 8.81% 71.444ms 8.81% 71.444ms 744.208us 96
414
- # aten::bmm 5.46% 44.258ms 5.46% 44.258ms 922.042us 48
415
- # ProfilerStep* 3.00% 24.362ms 100.00% 810.920ms 810.920ms 1
416
- # aten::div 2.85% 23.071ms 2.89% 23.447ms 976.958us 24
417
- # aten::_softmax 1.00% 8.087ms 1.00% 8.087ms 336.958us 24
418
- # aten::linear 0.32% 2.624ms 46.48% 376.888ms 1.041ms 362
419
- # aten::clone 0.23% 1.859ms 2.77% 22.430ms 228.878us 98
420
- # aten::t 0.14% 1.162ms 0.31% 2.502ms 6.912us 362
421
- # aten::view 0.14% 1.161ms 0.14% 1.161ms 1.366us 850
422
- # aten::transpose 0.12% 938.000us 0.17% 1.377ms 3.567us 386
423
- # aten::index_select 0.12% 933.000us 0.12% 952.000us 317.333us 3
424
- # aten::expand 0.11% 865.000us 0.12% 986.000us 2.153us 458
425
- # aten::matmul 0.10% 808.000us 8.31% 67.420ms 1.405ms 48
426
- # aten::cat 0.09% 701.000us 0.09% 703.000us 703.000us 1
427
- # aten::as_strided 0.08% 656.000us 0.08% 656.000us 0.681us 963
428
- # aten::relu 0.05% 420.000us 8.86% 71.864ms 748.583us 96
429
- # ------------------------- ------------ ------------ ------------ ------------ ------------ ------------
430
- # Self CPU time total: 810.920ms
404
+ # .. image:: ../_static/img/eager_prof.png
431
405
#
432
406
# Similarly, get the table for the inductor model:
433
407
#
434
- # .. code-block:: shell
435
- #
436
- # ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
437
- # Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
438
- # ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
439
- # mkl::_mkl_linear 68.52% 230.662ms 68.79% 231.573ms 639.704us 362
440
- # aten::bmm 8.02% 26.991ms 8.02% 26.992ms 562.333us 48
441
- # ProfilerStep* 3.35% 11.292ms 100.00% 336.642ms 336.642ms 1
442
- # graph_0_cpp_fused_constant_pad_nd_embedding_0 0.27% 915.000us 0.27% 915.000us 915.000us 1
443
- # aten::empty 0.27% 911.000us 0.27% 911.000us 2.517us 362
444
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_151 0.27% 901.000us 0.27% 901.000us 901.000us 1
445
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_226 0.27% 899.000us 0.27% 899.000us 899.000us 1
446
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_361 0.27% 898.000us 0.27% 898.000us 898.000us 1
447
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_121 0.27% 895.000us 0.27% 895.000us 895.000us 1
448
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_31 0.27% 893.000us 0.27% 893.000us 893.000us 1
449
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_76 0.26% 892.000us 0.26% 892.000us 892.000us 1
450
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_256 0.26% 892.000us 0.26% 892.000us 892.000us 1
451
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_346 0.26% 892.000us 0.26% 892.000us 892.000us 1
452
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_241 0.26% 891.000us 0.26% 891.000us 891.000us 1
453
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_316 0.26% 891.000us 0.26% 891.000us 891.000us 1
454
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_91 0.26% 890.000us 0.26% 890.000us 890.000us 1
455
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_106 0.26% 890.000us 0.26% 890.000us 890.000us 1
456
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_211 0.26% 890.000us 0.26% 890.000us 890.000us 1
457
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_61 0.26% 889.000us 0.26% 889.000us 889.000us 1
458
- # graph_0_cpp_fused__mkl_linear_add_mul_relu_286 0.26% 889.000us 0.26% 889.000us 889.000us 1
459
- # ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
460
- # Self CPU time total: 336.642ms
408
+ # .. image:: ../_static/img/inductor_prof.png
461
409
#
462
410
# From the profiling table of the eager model, we can see the most time consumption ops are [aten::addmm, aten::add, aten::copy_, aten::mul, aten::clamp_min, aten::bmm].
463
411
# Comparing with the inductor model profiling table, we notice there are ``mkl::_mkl_linear`` and fused kernel called ``graph_0_cpp_fused_*``. They are the major
0 commit comments