Skip to content

Micros is not ISR-safe #1680

Closed
Closed
@matthijskooijman

Description

@matthijskooijman

Describe the bug

The micros()/getCurrentMicros() implementation cannot currently be used inside an ISR, even though that is an obvious thing to do (quickly grab and store a timestamp in an ISR and then do more processing in the mainloop). When the timing is right (once every ms), the overflow detection in the code does not work properly, and the value returned will be exactly 1000 too low.

The current implementation has two problems:

  • It is not re-entrant: It clears COUNTFLAG at startup and then uses that flag to detect if an overflow happened while reading the timer values. If then another ISR would happen between the overflow (flag set) and reading the flag, and that ISR would call micros(), the flag would be cleared again and the original micros() call would return the wrong value.
  • It is sometimes wrong when called with interrupts disabled (e.g. with PRIMASK set, or inside an ISR that has priority=0 like tye systick ISR): When COUNTFLAG was set, the code would call HAL_GetTick() again, assuming that it would return an updated value.
    However, when interrupts are disabled, the systick overflow interrupt cannot run and HAL_GetTick() just returns the same value again, and micros() returns the wrong value.

To Reproduce

Run the following sketch:

MicrosBackwardsTest.ino.txt

I've tested this on a Nucleo F401 with core 2.2.0, but I think it will be broken on any STM32 (but if the CPU clock is not 84Mhz, the timer period in the sketch might need to be adapted, though it might also break as-is).

This sketch shows both problems above (you can select which by modifying the if(true) in the code).

For the reentrant test, the output is:

Start
OK: 1547
BACKWARDS: 1997 -> 1001
OK: 4538
BACKWARDS: 10998 -> 10003
OK: 941
BACKWARDS: 12997 -> 12002
OK: 4511
BACKWARDS: 21999 -> 21003
OK: 449
BACKWARDS: 22998 -> 22003
OK: 897
BACKWARDS: 24997 -> 24002
OK: 5116
BACKWARDS: 37997 -> 37002
OK: 5557
BACKWARDS: 48998 -> 48002
OK: 5045
BACKWARDS: 58998 -> 58003
OK: 445
BACKWARDS: 59998 -> 59003
OK: 21
BACKWARDS: 60998 -> 60002
OK: 2551
BACKWARDS: 69999 -> 69003
OK: 1443
BACKWARDS: 72998 -> 72002
OK: 5470
BACKWARDS: 83998 -> 83003
OK: 5040
BACKWARDS: 93999 -> 93003
OK: 945

Which shows that the micros run in one loop is sometimes less than the micros in the previous loop. How often this occurs is quite unpredictable, since this needs 3 things to happen at the same time, and I picked the timer interval to be prime and jump around a bit.

For the run-from-ISR test, the output is:

Start
SysTick LOAD (should be 83999): 83999
OK: 795
BACKWARDS: 794999 -> 794000
OK: 943
BACKWARDS: 1737999 -> 1737000
OK: 2830
BACKWARDS: 4565999 -> 4565000
OK: 0
BACKWARDS: 4566999 -> 4566000
OK: 942
BACKWARDS: 5508999 -> 5508000
OK: 943
BACKWARDS: 6451999 -> 6451000
OK: 943

Which shows two subsequent micros() calls are in the reverse order, which happens about once every second (which I think means the window for this problem is about 84 cycles, or 1 us).

Prior to writing this testcase, I have also seen this bug occur in practice (in a machine firmware where it would very rarely occur at a moment that it confused our scheduler, causing it to run one task continuously for about 20s, confusing some sensor we were reading, showing up as invalid sensor readings for 20s every couple of hours, but also sometimes not at all for a few days...).

Expected behavior
When calling micros() twice, the second call should always return equal or more than the first call.

Solutions

My original idea for fixing this was to, rather than using (and importantly, clearing) COUNTFLAG as is done now, instead use the systick ISR pending flag to detect an overflow, and to disable (just) the systick ISR to ensure a consistent read and ensure the pending flag is set. However, it turns out that (unlike how I expected from other platforms), disabling the systick ISR also prevents the pending flag from being set (maybe because systick is really an exception in the core, not an interrupt in the NVIC).

To still be able to use the pending flag, we resorted to using PRIMASK to disable all interrupts instead. See 3devo@efd2c7b for this implementation. However, this has a downside: Disabling all interrupts adds latency to all interrupts and is best avoided. Using SWD I measured that interrupts are disabled for around 67 cycles (bit more or less depending on how you count the enable/disable instructions), which is quite a bit more than I had hoped.

This lead to me thinking about a different approach: Rather than disabling interrupts and checking the flags, just get the ticks value twice, and if it is different, you know that an overflow occurred, and you can use the second ticks value and just have to reload the timer value (where you can assume that no second overflow will happen so soon after, which is safe unless there are ISRs that take >1ms, but even then the returned value might be technically wrong, it will still be a valid time that lies between the moment micros() is called and when it returns, so no backwards time there).

Here's what I wrote for that:

    uint32_t m = HAL_GetTick();
    __IO uint32_t v = SysTick->VAL;
    uint32_t m2 = HAL_GetTick();
    if (m != m2) {
      v = SysTick->VAL;
      m = m2;
    }
    const uint32_t tms = SysTick->LOAD + 1;
    return (m * 1000 + ((tms - v) * 1000) / tms);

This makes more changes than strictly needed (it was based on my PRIMASK-based approach, which needed to move the tms load and tms - VAL subtraction down to minimize the critical area), but the same approach can also be put into the existing code.

Looking around for existing issues, I found that the "query ticks twice and compare" method was actually used in the past already (introduced in 2fa0645 to fix exactly this issue: #4, micros running backwards) but it was replaced with the COUNTFLAG approach in #202. That PR says to improve accuracy of micros(), but looking closely I think it really only makes micros() a bit faster by removing one subroutine call to HAL_GetTicks() (which I think is an optimization better done using inlining or LTO), but it also unintentionally made micros ISR-unsafe.

Given the above, I think the "check ticks twice" approach would be the best one and should be restored.

(Though note that the original implementation had a bug: It did m1 > m0 which works almost always, except when the ticks value overflows from 2³² to 0, so a correct implementation should use m1 != m0)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions