Skip to content

why does small malloc fail when freeheap =76k+? #5346

Closed
@philbowles

Description

@philbowles

Hardware:

Core Installation version: 1.6.0
Computer OS: ?Windows 10? ?Mac OSX? ?Ubuntu?

image

Description:

I have an app specifically designed to find the "safe" lower heap limit which will allow any app to continue to run. Above 80k , it works exactly as designed, forcing the "heap grabbing" routine to back off once the lower limit is exceeded, and allowing it to continue as the heap is freed and returns back above a (higher) safe value.

The UI

image

The TL;DR is that Q is std::queue of asynchronous SSE writes which update the UI in "real time" (or at least as fast as the TCP TX rate will allow - currently can't seem to get past 20/s) once the TX rate is exceeded they get Q'd and the matching TCP ACK then pops them off the Q, freeing some heap, allowing more sends...rinse and repeat. The "Visualiser rate" is the rate per second that the Visualiser "LED" is flashed. Each "flash" is an SSE event and there is a rotary encoder hooked up to the Visualiser rate so I can just turn up / down the rate at which the Q fills / drains, and so by adjusting that and the cut-in and cut-out values I can force or avoid a low heap crash. The "Go" button (and the center button of the encoder on GPIO32) stop and start the visualiser so I can gauge the rate of Q fill/drain. The aim of the app is to find the exact lowest-safe-heap breaking point so that "production" code can be tuned to avoid it by judicious choice of of cut-out and cut-in values.

It all works wonderfully until it gets below about 80k, then I'm getting some strange results I don't understand which seems "plain wrong"

Crash Log

On a run with a lower bound "Heap Cut-out" of 50000 the following crash occured:
H= free heap, MB=Max free heap block

00:21:53.684 -> AARD:1: <---- AK 39 bytes Q=1334 H=77268 MB=67952 LOK=0
00:21:53.730 -> AARD:1: <---- AK 39 bytes Q=1337 H=77016 MB=67952 LOK=0
00:21:53.777 -> AARD:1: <---- AK 39 bytes Q=1340 H=76760 MB=67952 LOK=0
00:21:53.871 -> AARD:1: <---- AK 39 bytes Q=1346 H=76224 MB=67952 LOK=0
00:21:53.916 -> abort() was called at PC 0x40169a3f on core 1
00:21:53.916 -> 
00:21:53.916 -> ELF file SHA256: 0000000000000000
00:21:53.916 -> 
00:21:53.916 -> Backtrace: 0x400887c4:0x3ffb1d90 0x40088a41:0x3ffb1db0 0x40169a3f:0x3ffb1dd0 0x40169a86:0x3ffb1df0 0x4016915f:0x3ffb1e10 0x40169496:0x3ffb1e30 0x401691fd:0x3ffb1e50 0x400e5733:0x3ffb1e70 0x400e7946:0x3ffb1eb0 0x400db636:0x3ffb1ed0 0x400e01be:0x3ffb1ef0 0x400e0ee7:0x3ffb1f10 0x400e0f8a:0x3ffb1f30 0x400e0fa6:0x3ffb1f50 0x400e0ffe:0x3ffb1f90 0x400f0eed:0x3ffb1fb0 0x40089a52:0x3ffb1fd0
00:21:53.963 -> 

The failing line is a malloc(1436) call in WiFiUDP, but previous examples have been in my own code, ArduionOTA, MDNS...however in every instance the abort is called on a malloc requesting some small value well below what appears to be available.

Stack Trace


Decoding stack results
0x400887c4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 156
0x40088a41: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c line 171
0x40169a3f: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x40169a86: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x4016915f: __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_throw.cc line 87
0x40169496: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x401691fd: operator new[](unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_opv.cc line 32
0x400e5733: WiFiUDP::parsePacket() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\WiFi\src\WiFiUdp.cpp line 210
0x400e7946: ArduinoOTAClass::handle() at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\libraries\ArduinoOTA\src\ArduinoOTA.cpp line 379
0x400db636: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at C:\Users\phil\Documents\Arduino\libraries\h4plugins\src\H4P_WiFi.cpp line 189
0x400e01be: std::function ::operator()() const at c:\users\phil\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-97-gc752ad5-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 2271
0x400e0ee7: task::operator()() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 154
0x400e0f8a: H4::next() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 270
0x400e0fa6: H4::loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 358
0x400e0ffe: loop() at C:\Users\phil\Documents\Arduino\libraries\H4\src\H4.cpp line 315
0x400f0eed: loopTask(void*) at C:\Users\phil\AppData\Local\Arduino15\packages\esp32\hardware\esp32\1.0.6\cores\esp32\main.cpp line 23
0x40089a52: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

Questions

  1. Why does malloc(1436) fail when ESP.getFreeHeap() shows 76224 and ESP.getMaxAllocHeap() shows 67952? Most of the fails in my own code are of only 30 or 40 bytes, but still the failing line is always a malloc while apparently huge amounts of free heap are available
  2. I can hear someone saying "heap corruption rather than exhaustion" but why then when the lower limit is in the 80s or 90s does it run all day without missing a beat? What possible things could be happening in the background that only screw-up below 77k ? Why 77k?
  3. If the ESP heap calls are not the best predictor of impending heap failure, what other metric is? What is the safest and most reliable way of knowing when you are running dangerously low on memory?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions