Skip to content

Commit af6450b

Browse files
committed
[ET-VK] Tuning native layer norm local workgroup size to improve thread occupancy during reduce.
This diff is tuning the local workgroup size of the native layer norm operation in Vulkan backend of Executorch to improve thread occupancy during the reduce phase. Differential Revision: [D72581293](https://our.internmc.facebook.com/intern/diff/D72581293/) ghstack-source-id: 276900533 Pull Request resolved: #9984
1 parent 0a68f5e commit af6450b

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

backends/vulkan/runtime/graph/ops/impl/NativeLayerNorm.cpp

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,18 @@ void add_native_layer_norm_node(
8484
std::vector<int64_t> in_sizes = t_input->sizes();
8585

8686
utils::uvec3 global_size = t_out->logical_limits();
87-
utils::uvec3 local_size = graph.create_local_wg_size(global_size);
87+
utils::uvec3 local_size;
88+
89+
// Since the shader sets shared memory scale factor > 1, if dispatch is
90+
// greater than maximum WG size. Setting WG size in X axis to max WG size,
91+
// would allow best thread utilization.
92+
if (global_size[0] > 64) {
93+
local_size = {64, 1, 1};
94+
} else {
95+
// If thread size in X axis is smaller or equal to maximum WG size, we can
96+
// let the function decide the best WG size.
97+
local_size = graph.create_local_wg_size(global_size);
98+
}
8899

89100
std::string kernel_name("native_layer_norm");
90101
kernel_name.reserve(kShaderNameReserve);

0 commit comments

Comments
 (0)