Skip to content

Early exit optimization of Fortran array expressions #129812

Open
@ivan-pi

Description

@ivan-pi

Consider a function for checking if an array is sorted:

!
! Check an array of integers is sorted in ascending order
!
logical function is_sorted_scalar(n,a) result(is_sorted)
    integer, intent(in) :: n
    integer, intent(in) :: a(n)
    integer :: i
    !$omp simd simdlen(8) early_exit
    do i = 2, n
        if (a(i) < a(i-1)) then
            is_sorted = .false.
            return
        end if
    end do
    is_sorted = .true.
end function

logical function is_sorted_all(n,a) result(is_sorted)
    integer, intent(in) :: n
    integer, intent(in) :: a(n)
    is_sorted = all(a(2:n) >= a(1:n-1))
end function

program benchmark

    implicit none
    integer, allocatable :: a(:)

    integer :: i, n

    external :: is_sorted_scalar
    external :: is_sorted_all

    logical :: is_sorted_scalar
    logical :: is_sorted_all

    character(len=32) :: str
    integer :: tmp

    tmp = 0
    n = 20000

    if (command_argument_count() > 0) then
        call get_command_argument(1,str)
        read(str,*) tmp
        if (tmp > 0) n = tmp
    end if
    print *, "n = ",  n

    allocate(a(n))

    ! Fill ascending numbers
    do i = 1, n
        a(i) = i
    end do

    ! Introduce an unsorted value
    a(100) = 1001
    !a(101) = 1000

    call measure(100000,a,is_sorted_scalar,"scalar")
    call measure(100000,a,is_sorted_all,   "all")

contains

    impure subroutine measure(nreps,a,func,name)
        integer, intent(in) :: nreps
        integer, intent(in) :: a(:)
        logical :: func
        character(len=*), intent(in) :: name
        integer(8) :: t1, t2, rate
        real(kind(1.0d0)) :: elapsed
        logical :: res

        character(len=12) :: str

        integer :: k
        call system_clock(t1)
        do k = 1, nreps
            res = func(size(a),a)
        end do
        call system_clock(t2,rate)

        elapsed = (t2 - t1)/real(rate,kind(elapsed))

        str = adjustl(name)
        print '(A12,F12.4,L2)', str, elapsed/nreps*1.e6, res

        ! Time is in microseconds

    end subroutine

end program

It appears like the is_sorted_all function in flang generates a temporary array for the a(2:n) >= a(1:n-1) expression, and then performs the all reduction. This is fast due to vectorization, but it missed the chance of early exit.

The effect is noticeable in the runtime:

~/fortran/is_sorted$ make FC=flang-new FFLAGS="-O2 -march=native" standalone
flang-new -O2 -march=native -o standalone standalone.f90
~/fortran/is_sorted$ ./standalone 
 n =  20000
scalar            0.0673 F
all               1.7358 F
~/fortran/is_sorted$ make clean
rm -rf *.o benchmark standalone
~/fortran/is_sorted$ make FC=gfortran FFLAGS="-O2 -march=native" standalone
gfortran -O2 -march=native -o standalone standalone.f90
~/fortran/is_sorted$ ./standalone 
 n =        20000
scalar            0.0389 F
all               0.0390 F

It would be nice if early exit vectorization were also supported (https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690). With x86 SIMD extensions this still has to be done manually it seems: http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html

Compiler Explorer link: https://godbolt.org/z/c3GK5do6T

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions