Skip to content

cortex_m::asm::delay is wrong on anything more complex than an M4 #236

Closed
@cbiffle

Description

@cbiffle

The contract for delay currently states:

Blocks the program for at least n instruction cycles

The current implementation is:

            llvm_asm!("1:
                  nop
                  subs $0, $$1
                  bne.n 1b"
                 : "+r"(_n / 4 + 1)
                 :
                 : "cpsr"
                 : "volatile");

This is approximately an assembly gloss of the code

let mut i = n / 4 + 1;
loop {
    i -= 1;
    if i == 0 { break }
}

Why the divide-by-four? Well, the code appears to be assuming that the minimum amount of time that the instruction sequence could take is four cycles, and so it is taking that into account to minimize the amount of over-delaying. That cycle count is approximately accurate on M3/M4 (it's an underestimate in practice, even in zero-wait-state RAM), but is wrong on M7 or any other part with any combination of nop-folding, branch prediction, and multiple issue. As a result, on such CPUs, this function delays for substantially fewer cycles than advertised -- by roughly a factor of 2, depending on cache and memory system configuration, branch alignment, etc. (It also appears to be wrong on M33, but M33 is not superscalar, so it's probably just an instruction timing issue.)

As CPU core complexity grows, any hand-tuned cycle delay loops need to be specialized for the particular microarchitecture being used, much like in the 486/Pentium era. Alternatively, you could use a conservative implementation that, when asked to loop for at least n cycles, repeats the loop n times -- that should be safe for now, but I wouldn't recommend it, particularly because we've got the zero-overhead-loop extension to ARMv8-M on the horizon.

Shout out to @labbott for noticing this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions