Skip to content

Commit 38c4c77

Browse files
committed
[ET-VK] Tuning native layer norm local workgroup size to improve thread occupancy during reduce.
Pull Request resolved: #9984 This diff is tuning the local workgroup size of the native layer norm operation in Vulkan backend of Executorch to improve thread occupancy during the reduce phase. ghstack-source-id: 277933491 Differential Revision: [D72581293](https://our.internmc.facebook.com/intern/diff/D72581293/)
1 parent dd8fe36 commit 38c4c77

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

backends/vulkan/runtime/graph/ops/impl/NativeLayerNorm.cpp

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,18 @@ void add_native_layer_norm_node(
8484
std::vector<int64_t> in_sizes = t_input->sizes();
8585

8686
utils::uvec3 global_size = t_out->logical_limits();
87-
utils::uvec3 local_size = graph.create_local_wg_size(global_size);
87+
utils::uvec3 local_size;
88+
89+
// Since the shader sets shared memory scale factor > 1, if dispatch is
90+
// greater than maximum WG size. Setting WG size in X axis to max WG size,
91+
// would allow best thread utilization.
92+
if (global_size[0] > 64) {
93+
local_size = {64, 1, 1};
94+
} else {
95+
// If thread size in X axis is smaller or equal to maximum WG size, we can
96+
// let the function decide the best WG size.
97+
local_size = graph.create_local_wg_size(global_size);
98+
}
8899

89100
std::string kernel_name("native_layer_norm");
90101
kernel_name.reserve(kShaderNameReserve);

0 commit comments

Comments
 (0)