Skip to content

enable_icache is wrong on M7 without inline-asm feature #232

Closed
@cbiffle

Description

@cbiffle

This has been a fun one to track down.

tl;dr: if enabling instruction and/or data caches on your M7 is causing weird crashes, turn on the cortex_m inline-asm feature and make sure you're compiling at --release. Oh, and you're on nightly now. Sorry.

Details

I've got an STM32H7B3 here, and shortly after enabling the icache I would sometimes get hard faults that looked suspiciously like it trying to execute random data. (I'm doing this early enough in boot that I don't have configurable faults broken out, but they were either BF or MMF.)

The key here is sometimes -- some builds worked okay. Until I tried turning on dcache. Then things went squirrely.

It came down to: how many instructions did the compiler decide to insert between turning on the cache and the dsb / isb sequence? If the answer is not zero and contains a branch or memory access, weird things will happen.

The current implementation of enable_icache reads as follows:

    #[inline]
    pub fn enable_icache(&mut self) {
        // Don't do anything if I-cache is already enabled
        if Self::icache_enabled() {
            return;
        }

        // NOTE(unsafe): No races as all CBP registers are write-only and stateless
        let mut cbp = unsafe { CBP::new() };

        // Invalidate I-cache
        cbp.iciallu();

        // Enable I-cache
        // NOTE(unsafe): We have synchronised access by &mut self
        unsafe { self.ccr.modify(|r| r | SCB_CCR_IC_MASK) };

        crate::asm::dsb();
        crate::asm::isb();
    }

The tail end of that function is almost right.

  • The modify will end in a str.
  • The barriers will follow.

However,

  • There's no use of compiler_fence or equivalent to prevent instructions from being rearranged above the barriers. (Perhaps we can assume this won't happen because they're extern "C"? I'm fairly certain I just saw it happen but I didn't save the listing.)
  • dsb and isb default to being function calls

This means the actual dynamic instruction trace here is

str r1, [r0, #0]
bl __dsb
dsb sy
bx lr
bl __isb
isb sy
bx lr

In other words, we've got one branch in the pipeline before the write even takes effect, and no fewer than three before the isb causes a prefetch/pipeline flush. These instructions often execute incorrectly. (On a reasonably fast M7, "often" becomes "always" in my testing.)

Enabling the inline-asm feature fixes this by changing the dynamic trace to be what you'd expect:

str r1, [r0, #0]
dsb sy
isb sy

This is probably an issue with any existing code attempting to use isb to synchronize with memory map or cache changes. To do this correctly, one might consider making enable_icache (and enable_dcache though it crashes less reliably) contingent on the inline-asm feature, but that only produces the right instruction sequence at --release.

These routines probably need to be written in assembly to be correct on stable toolchains (and on debug builds on nightly toolchains, for that matter, to bind the compiler's hands). I haven't thought about the equivalent disable cases in detail, but they and the clean/invalidate routines probably bear review as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions