Open
Description
Using BRGEMM calls is tied to register allocation. Since the calls are inserted at vector dialect level there's a need to store some kind of hints for register allocation. We can try and avoid the problem altogether by inserting (inlining) BRGEMM body into the inner loop. Need to evaluate whether the approach is feasible.