BytesMut no longer uses atomics; for smol there is a PR

RalfJung · RalfJung · commit 8d6c4e6961c4 · 2023-08-13T21:21:16.000+02:00
diff --git a/resources/deliberate-ub.md b/resources/deliberate-ub.md
@@ -16,16 +16,17 @@ We should evaluate whether there truly is some use-case here that is not current
   The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.<br>
   Furthermore, `compare_exchange` will compare padding bytes, which leads to UB.
   It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes.
-* `bytes` does [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
+* `bytes` used to do [a non-atomic plain load that races](https://github.com/tokio-rs/bytes/blob/dea868a4b0eec28877e9013702c0ae12dbc40c4b/src/bytes.rs#L2508),
   because relaxed loads are claimed to cost too much performance (also see [this LLVM issue](https://github.com/llvm/llvm-project/issues/37064)).
   (Note that LLVM's handling of data races is not enough here, data races still return garbage data -- so this is UB in the generated LLVM IR. Also see [this thread](https://internals.rust-lang.org/t/unordered-as-a-solution-to-bit-wise-reasoning-for-atomic-accesses/11079) on using "unordered".)
   The best fix is probably to improve LLVM optimizations on relaxed loads, so that they are not considered too slow for this use-case.
   However, it is not clear if all the desired optimizations are even legal for relaxed loads -- there is a significant semantic difference between relaxed loads and non-atomic loads, so seeing *some* performance difference is expected.
   Code that works with concurrency can't just expect not to have to pay a cost for that.
+  And anyway current `bytes` does not seem to have any atomics in `ByteMut` any more, so possibly this is resolved.
 * `smol-rs/event-listener` needs an SC fence, but on x86 they prefer a `lock cmpxchg` over an `mfence`.
   To achieve this, they do an [SC `compare_exchange` on a local variable](https://github.com/smol-rs/event-listener/blob/0ea464102e74219aab2932f9eff14418a13268d4/src/notify.rs#L574-L577).
   That is UB (both in Rust and LLVM) since such an operation has no synchronization effect; the compiler could easily see that this variable is never accessed from outside this function and hence optimize it away entirely.
-  The best fix is probably an inline assembly block.
+  The best fix is probably an [inline assembly block](https://github.com/smol-rs/event-listener/pull/71).
 
 ### Cases related to aliasing