Open
Description
Encountered in the following matmul config
module attributes {dlti.target_system_spec = #dlti.target_system_spec<"CPU" : #dlti.target_device_spec<#dlti.dl_entry<"L1_cache_size_in_bytes", 49152 : ui32>, #dlti.dl_entry<"L2_cache_size_in_bytes", 2097152 : ui64>, #dlti.dl_entry<"L3_cache_size_in_bytes", 110100480 : ui64>, #dlti.dl_entry<"num_threads", 56 : i32>, #dlti.dl_entry<"max_vector_width", 512 : i64>>>} {
func.func @entry(%arg0: tensor<128x512xbf16>, %arg1: tensor<512x1024xbf16>) -> tensor<128x1024xbf16> attributes {llvm.emit_c_interface} {
%cst = arith.constant 0.000000e+00 : bf16
%0 = tensor.empty() : tensor<128x1024xbf16>
%1 = linalg.fill ins(%cst : bf16) outs(%0 : tensor<128x1024xbf16>) -> tensor<128x1024xbf16>
%2 = linalg.matmul {KBlock = 32 : i32, KThreads = 1 : i32, MBlock = 32 : i32, MThreads = 4 : i32, NBlock = 128 : i32, NThreads = 14 : i32, cast = #linalg.type_fn<cast_signed>, innermostKBlock = 32 : i32, innermostMBlock = 32 : i32, innermostNBlock = 32 : i32} ins(%arg0, %arg1 : tensor<128x512xbf16>, tensor<512x1024xbf16>) outs(%1 : tensor<128x1024xbf16>) -> tensor<128x1024xbf16>
return %2 : tensor<128x1024xbf16>
}
}
After one-shot bufferization, we encounter the following
%alloc_3 = memref.alloc(%6) {alignment = 64 : i64} : memref<32x?xf32>
which further being lowered to un-eliminable builtin.unrealized_conversion_cast
.
Metadata
Metadata
Assignees
Labels
No labels