[libc] add a simple TTAS spin lock #98846

SchrodingerZhu · 2024-07-15T00:04:43Z

This is to be used in pthread_spinlock.

llvmbot · 2024-07-15T00:05:17Z

@llvm/pr-subscribers-libc

Author: Schrodinger ZHU Yifan (SchrodingerZhu)

Changes

This is to be used in pthread_spinlock.

Full diff: https://github.com/llvm/llvm-project/pull/98846.diff

2 Files Affected:

(modified) libc/src/__support/threads/CMakeLists.txt (+9)
(added) libc/src/__support/threads/spin_lock.h (+56)

diff --git a/libc/src/__support/threads/CMakeLists.txt b/libc/src/__support/threads/CMakeLists.txt
index 9ea0b59befe7a..d2e46b8e2574e 100644
--- a/libc/src/__support/threads/CMakeLists.txt
+++ b/libc/src/__support/threads/CMakeLists.txt
@@ -10,6 +10,15 @@ add_header_library(
     sleep.h
 )
 
+add_header_library(
+  spin_lock
+  HDRS
+    spin_lock.h
+  DEPENDS
+    .sleep
+    libc.src.__support.CPP.atomic
+)
+
 if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/${LIBC_TARGET_OS})
   add_subdirectory(${LIBC_TARGET_OS})
 endif()
diff --git a/libc/src/__support/threads/spin_lock.h b/libc/src/__support/threads/spin_lock.h
new file mode 100644
index 0000000000000..8fd6fb960f3ce
--- /dev/null
+++ b/libc/src/__support/threads/spin_lock.h
@@ -0,0 +1,56 @@
+//===-- TTAS Spin Lock ----------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_THREADS_SPIN_LOCK_H
+#define LLVM_LIBC_SRC___SUPPORT_THREADS_SPIN_LOCK_H
+
+#include "src/__support/CPP/atomic.h"
+#include "src/__support/macros/attributes.h"
+#include "src/__support/threads/sleep.h"
+namespace LIBC_NAMESPACE_DECL {
+class SpinLock {
+  cpp::Atomic<bool> flag;
+
+public:
+  LIBC_INLINE constexpr SpinLock() : flag{false} {}
+  LIBC_INLINE bool try_lock() {
+    return !flag.exchange(true, cpp::MemoryOrder::ACQUIRE);
+  }
+  LIBC_INLINE void lock() {
+    // clang-format off
+    // this compiles to the following on armv9a and x86_64:
+    //         mov     w8, #1            |          .LBB0_1:
+    // .LBB0_1:                          |                  mov     al, 1
+    //         swpab   w8, w9, [x0]      |                  xchg    byte ptr [rdi], al
+    //         tbnz    w9, #0, .LBB0_3   |                  test    al, 1
+    //         b       .LBB0_4           |                  jne     .LBB0_3
+    // .LBB0_2:                          |                  jmp     .LBB0_4
+    //         isb                       |         .LBB0_2:
+    // .LBB0_3:                          |                  pause
+    //         ldrb    w9, [x0]          |         .LBB0_3:       
+    //         tbnz    w9, #0, .LBB0_2   |                  movzx   eax, byte ptr [rdi]
+    //         b       .LBB0_1           |                  test    al, 1
+    // .LBB0_4:                          |                  jne     .LBB0_2
+    //         ret                       |                  jmp     .LBB0_1
+    //                                   |          .LBB0_4:
+    //                                   |                  ret
+    // clang-format on
+    // Notice that inside the busy loop .LBB0_2 and .LBB0_3, only instructions
+    // with load semantics are used. swpab/xchg is only issued in outer loop
+    // .LBB0_1. This is useful to avoid extra writer traffic. The cache
+    // coherence guarantees "write propagation", so even if the inner loop only
+    // reads with relaxed ordering, the thread will evetually see the write.
+    while (!try_lock())
+      while (flag.load(cpp::MemoryOrder::RELAXED))
+        sleep_briefly();
+  }
+  LIBC_INLINE void unlock() { flag.store(false, cpp::MemoryOrder::RELEASE); }
+};
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SRC___SUPPORT_THREADS_SPIN_LOCK_H

jhuber6

I've found using fetch_or and fetch_and works better on the GPU targets. That is,

atomic<int32_t> lock{};

bool has_lock = !lock.fetch_or(1, cpp::MemoryOrder::Acquire);
...
lock.fetch_and(0, cpp::MemoryOrder::Release);

Any clue how this performs on non-GPU targets?

SchrodingerZhu · 2024-07-15T00:13:21Z

@jhuber6 I think this technique is called CTR as in https://arxiv.org/pdf/2102.03863, section 2.1. But TTAS semantic should be more general. Frankly speaking, I am not sure how it performs on modern hardware regarding implement TTAS locks.

SchrodingerZhu · 2024-07-15T00:15:38Z

So according to the discussion in that paper, CTR is considered more performant in 1-to-1 communication:

in our case with the 1-to-1 communication protocol used on the Grant field in Hemlock, busy-waiting via
read-modify-write atomic operations provides a performance benefit. Because of the simple communication pattern, back-off in the busy-waiting loop is not useful.

SchrodingerZhu · 2024-07-16T03:15:39Z

@jhuber6 Updated. Please take a look.

jhuber6

Spin locks in general don't work on the GPU due to the lack of a fair scheduler, you need to make sure there's enough resources that any thread can make progress even if every other thread is suspended and never comes back. I'm not too concerned about the GPU implementation for this, since I don't think it'll be used in anything. My questions were mostly whether or not those bitwise operations worked better since in my experience they're considerably faster for the type of 1-to-1 waiting that's done on GPUs. However, looking at the generated assembly for x64 it's definitely not the case there. I think I'm fine with either this or the one you had before.

libc/src/__support/threads/spin_lock.h

Co-authored-by: Joseph Huber <[email protected]>

SchrodingerZhu · 2024-07-16T03:34:38Z

Yeah, spinlocks are quite useless on GPU. warp/wavefront divergence plus other scheduling issues will make a spinlock a deadlock easily.

And for many architectures, there is no progression guarantee for cross block/CTA synchronization, so spinlock variants cannot be used across blocks/CTAs neither.

I will use the updated version anyway. Just to leave the potential for the future.

SchrodingerZhu requested a review from jhuber6 July 15, 2024 00:04

llvmbot added the libc label Jul 15, 2024

[libc] add a simple TTAS spin lock

92579a7

SchrodingerZhu force-pushed the libc/spin_lock branch from 1e454e2 to 92579a7 Compare July 15, 2024 00:06

jhuber6 reviewed Jul 15, 2024

View reviewed changes

[libc] address CR

7a90921

SchrodingerZhu requested a review from jhuber6 July 16, 2024 03:15

[libc] avoid pollution

47e9462

jhuber6 approved these changes Jul 16, 2024

View reviewed changes

libc/src/__support/threads/spin_lock.h Outdated Show resolved Hide resolved

Update spin_lock.h

cb0842f

Co-authored-by: Joseph Huber <[email protected]>

SchrodingerZhu merged commit 408a351 into llvm:main Jul 16, 2024
4 of 5 checks passed

SchrodingerZhu deleted the libc/spin_lock branch July 16, 2024 04:15

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024

[libc] add a simple TTAS spin lock (#98846)

b3862c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libc] add a simple TTAS spin lock #98846

[libc] add a simple TTAS spin lock #98846

Uh oh!

SchrodingerZhu commented Jul 15, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jul 15, 2024

Uh oh!

jhuber6 left a comment

Uh oh!

SchrodingerZhu commented Jul 15, 2024

Uh oh!

SchrodingerZhu commented Jul 15, 2024

Uh oh!

SchrodingerZhu commented Jul 16, 2024

Uh oh!

jhuber6 left a comment

Uh oh!

Uh oh!

SchrodingerZhu commented Jul 16, 2024

Uh oh!

Uh oh!

Uh oh!

[libc] add a simple TTAS spin lock #98846

[libc] add a simple TTAS spin lock #98846

Uh oh!

Conversation

SchrodingerZhu commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 15, 2024

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

SchrodingerZhu commented Jul 15, 2024

Uh oh!

SchrodingerZhu commented Jul 15, 2024

Uh oh!

SchrodingerZhu commented Jul 16, 2024

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SchrodingerZhu commented Jul 16, 2024

Uh oh!

Uh oh!

Uh oh!

SchrodingerZhu commented Jul 15, 2024 •

edited

Loading