Skip to content

[flang][hlfir] Polyhedron/nf 23% performance regression #65426

Closed
@vzakhari

Description

@vzakhari

Code with HLFIR lowering runs for 7.4 seconds vs 6 seconds with FIR lowering.

There is some overhead due to extra temporaries at line 261:

256	subroutine NF2DPrecon(x,i1,i2)       ! 2D NF Preconditioning matrix				
257	integer :: i1 , i2				
258	real(dpkind),dimension(i2)::x,t				
259	integer :: i				
260	do i = i1 , i2 , nx				
261	   if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
262	   call trisolve(x,i,i+nx-1)				
263	enddo 				
264	do i = i2-2*nx+1 , i1 , -nx				
265	   t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1)				
266	   call trisolve(t,i,i+nx-1)				
267	   x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1)				
268	enddo				
269	end subroutine NF2DPrecon            !=========================================				

ArrayValueCopy has special handling for array slices of the form (i:j) and (j+1:k), which allows disambiguating x(i:i+nx-1) with x(i-nx:i-1). We can probably do the same in the optimized bufferization pass or implement something more generic. For example, we can try to use the affine dialect utilities to detect store-load conflicts based on the iteration space constraints derived from the slices configurations and the mapping of the iteration indices to the memory locations (based on the designator indexing).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions