`enable_icache` is wrong on M7 without `inline-asm` feature

This has been a fun one to track down.

**tl;dr:** if enabling instruction and/or data caches on your M7 is causing weird crashes, turn on the `cortex_m` `inline-asm` feature and make sure you're compiling at `--release`. Oh, and you're on nightly now. Sorry.

# Details

I've got an STM32H7B3 here, and shortly after enabling the icache I would _sometimes_ get hard faults that looked suspiciously like it trying to execute random data. (I'm doing this early enough in boot that I don't have configurable faults broken out, but they were either BF or MMF.)

The key here is _sometimes_ -- some builds worked okay. Until I tried turning on dcache. Then things went squirrely.

It came down to: _how many instructions did the compiler decide to insert between turning on the cache and the `dsb` / `isb` sequence?_ If the answer is not zero and contains a branch or memory access, _weird things_ will happen.

The current implementation of `enable_icache` reads as follows:
```rust
    #[inline]
    pub fn enable_icache(&mut self) {
        // Don't do anything if I-cache is already enabled
        if Self::icache_enabled() {
            return;
        }

        // NOTE(unsafe): No races as all CBP registers are write-only and stateless
        let mut cbp = unsafe { CBP::new() };

        // Invalidate I-cache
        cbp.iciallu();

        // Enable I-cache
        // NOTE(unsafe): We have synchronised access by &mut self
        unsafe { self.ccr.modify(|r| r | SCB_CCR_IC_MASK) };

        crate::asm::dsb();
        crate::asm::isb();
    }
```

The tail end of that function is _almost_ right.

- The `modify` will end in a `str`.
- The barriers will follow.

However,

- There's no use of `compiler_fence` or equivalent to prevent instructions from being rearranged above the barriers. (Perhaps we can assume this won't happen because they're `extern "C"`? I'm _fairly certain_ I just saw it happen but I didn't save the listing.)
- `dsb` and `isb` _default to being function calls_

This means the actual dynamic instruction trace here is

```
str r1, [r0, #0]
bl __dsb
dsb sy
bx lr
bl __isb
isb sy
bx lr
```

In other words, we've got one branch in the pipeline before the write even takes effect, and no fewer than three before the `isb` causes a prefetch/pipeline flush. These instructions often execute incorrectly. (On a reasonably fast M7, "often" becomes "always" in my testing.)

Enabling the `inline-asm` feature fixes this by changing the dynamic trace to be what you'd expect:

```
str r1, [r0, #0]
dsb sy
isb sy
```

**This is probably an issue with _any_ existing code attempting to use `isb` to synchronize with memory map or cache changes.** To do this correctly, one might consider making `enable_icache` (and `enable_dcache` though it crashes less reliably) contingent on the `inline-asm` feature, but that only produces the right instruction sequence at `--release`.

These routines probably need to be written in assembly to be correct on stable toolchains (and on debug builds on nightly toolchains, for that matter, to bind the compiler's hands). I haven't thought about the equivalent `disable` cases in detail, but they and the clean/invalidate routines probably bear review as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`enable_icache` is wrong on M7 without `inline-asm` feature #232

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

enable_icache is wrong on M7 without inline-asm feature #232

Description

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`enable_icache` is wrong on M7 without `inline-asm` feature #232