Closed
Description
Code with HLFIR lowering runs for 7.4 seconds vs 6 seconds with FIR lowering.
There is some overhead due to extra temporaries at line 261:
256 subroutine NF2DPrecon(x,i1,i2) ! 2D NF Preconditioning matrix
257 integer :: i1 , i2
258 real(dpkind),dimension(i2)::x,t
259 integer :: i
260 do i = i1 , i2 , nx
261 if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
262 call trisolve(x,i,i+nx-1)
263 enddo
264 do i = i2-2*nx+1 , i1 , -nx
265 t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1)
266 call trisolve(t,i,i+nx-1)
267 x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1)
268 enddo
269 end subroutine NF2DPrecon !=========================================
ArrayValueCopy
has special handling for array slices of the form (i:j)
and (j+1:k)
, which allows disambiguating x(i:i+nx-1)
with x(i-nx:i-1)
. We can probably do the same in the optimized bufferization pass or implement something more generic. For example, we can try to use the affine dialect utilities to detect store-load conflicts based on the iteration space constraints derived from the slices configurations and the mapping of the iteration indices to the memory locations (based on the designator indexing).