Description
Upon upgrading our Azure CI machines to use the new Azure Cobalt ARM64 processors, we started seeing frequent compiler crashes when building a large Swift project. After some investigation, the culprit appears to be a lifecycle violation in libdispatch in the Windows pipe handling code.
The crashing line: https://github.com/apple/swift-corelibs-libdispatch/blob/e85f6a0d5c9ea1f32f5013c3fa34e4fc146cd0eb/src/event/event_windows.c#L240
And the stack trace:
[Inline Frame] dispatch.dll!_dispatch_muxnote_dispose(dispatch_muxnote_s * dmn) Line 240 C
[Inline Frame] dispatch.dll!_dispatch_muxnote_release(dispatch_muxnote_s * dmn) Line 265 C
[Inline Frame] dispatch.dll!_dispatch_event_merge_pipe_handle_read(dispatch_muxnote_s * dmn, unsigned long dwBytesAvailable) Line 669 C
dispatch.dll!_dispatch_event_loop_drain(unsigned int flags) Line 915 C
dispatch.dll!_dispatch_mgr_invoke() Line 5419 C
dispatch.dll!_dispatch_mgr_thread(dispatch_lane_s * dq, dispatch_invoke_context_s * dic, <unnamed-tag> flags) Line 5447 C
[Inline Frame] dispatch.dll!_dispatch_continuation_pop_inline(dispatch_object_t dou, dispatch_invoke_context_s * dic, <unnamed-tag> flags, dispatch_queue_class_t dqu) Line 2496 C
dispatch.dll!_dispatch_root_queue_drain(dispatch_queue_global_s * dq, unsigned int pri, <unnamed-tag> flags) Line 6114 C
dispatch.dll!_dispatch_worker_thread(void * context) Line 6250 C
dispatch.dll!_dispatch_worker_thread_thunk(void * lpParameter) Line 6272 C
[External Code]
I suspect this is not an Cobalt/ARM64 specific issue, but is more likely a long-standing bug which has become common on this particular line of CPUs due to some scheduling or timing change.
The interesting section is here:
https://github.com/apple/swift-corelibs-libdispatch/blob/e85f6a0d5c9ea1f32f5013c3fa34e4fc146cd0eb/src/event/event_windows.c#L667-L669
The event set here is used to synchronize with the pipe monitoring thread, which itself calls _dispatch_muxnote_retain
.Perhaps a change in timing affected the typical order of operations here, although I haven't been able to prove this yet.
I'm trying to reproduce the crash under LIBDISPATCH_LOG
to get some more information.