Retry once after getting a deadlock when attempting to decrement a semaphore

rosa · rosa · commit d72fe4607f8f · 2024-01-02T20:52:30.000+01:00
This tries to address a tricky deadlock we've seen about once every couple of days,
where three jobs that compete for the semaphore are enqueued at the same time.
One of them wins at creating the semaphore, and the other two transactions acquire
a shared lock over the just created semaphore row, by key. Then, they try to upgrade
that lock to an exclusive lock to perform an UPDATE (attempting to decrement the
semaphore), leading to a deadlock because each one of them is waiting for the other one
to release the shared lock.

From `SHOW ENGINE INNODB STATUS`:

```
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-12-27 07:57:28 140410341029440
*** (1) TRANSACTION:
TRANSACTION 1972990032, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012240, OS thread handle 140409154041408, query id 7398762432 bigip-vip-new.rw-ash-int.37signals.com 10.20.0.24 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.002702' WHERE (value &gt; 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
 1: len 8; hex 80000000004224c4; asc      B$ ;;

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990032 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
 1: len 8; hex 80000000004224c4; asc      B$ ;;

*** (2) TRANSACTION:
TRANSACTION 1972990013, ACTIVE 1 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 3012575, OS thread handle 140275687212608, query id 7398762530 bigip-vip.sc-chi-int.37signals.com 10.10.0.37 haystack_app updating
UPDATE `solid_queue_semaphores` SET value = value - 1, expires_at = '2023-12-27 08:12:28.007153' WHERE (value &gt; 0) AND `solid_queue_semaphores`.`key` = 'RR::ProcessJob/C/64961261'

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock mode S
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
 1: len 8; hex 80000000004224c4; asc      B$ ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 14 page no 426 n bits 304 index index_solid_queue_semaphores_on_key of table `haystack_solidqueue_production`.`solid_queue_semaphores` trx id 1972990013 lock_mode X locks rec but not gap waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 30; hex 526563656970743a3a526563697069656e743a3a50726f63657373696e67; asc RR::Process; (total 50 bytes);
 1: len 8; hex 80000000004224c4; asc      B$ ;;

*** WE ROLL BACK TRANSACTION (2)
```

With this change, on the transaction that gets killed because of the deadlock,
we'll try to wait again, but this time without having a shared lock because we
won't try to create the semaphore, we know the semaphore is already created.

A problem that could happen here would be something deleting the semaphore while
we're retrying. However, that should be ok as we only delete semaphores as part
of periodic maintenance, and that happens only for expired semaphores. This retry
is necessary when the semaphore just got created, so we can assume it won't expire
and will be deleted under us right on the very same moment.
diff --git a/app/models/solid_queue/semaphore.rb b/app/models/solid_queue/semaphore.rb
@@ -1,65 +1,86 @@
 # frozen_string_literal: true
 
-class SolidQueue::Semaphore < SolidQueue::Record
-  scope :available, -> { where("value > 0") }
-  scope :expired, -> { where(expires_at: ...Time.current) }
+module SolidQueue
+  class Semaphore < Record
+    scope :available, -> { where("value > 0") }
+    scope :expired, -> { where(expires_at: ...Time.current) }
 
-  class << self
-    def wait(job)
-      Proxy.new(job, self).wait
-    end
+    class << self
+      def wait(job)
+        Proxy.new(job).wait
+      end
 
-    def signal(job)
-      Proxy.new(job, self).signal
+      def signal(job)
+        Proxy.new(job).signal
+      end
     end
-  end
 
-  class Proxy
-    def initialize(job, proxied_class)
-      @job = job
-      @proxied_class = proxied_class
-    end
+    class Proxy
+      def initialize(job)
+        @job = job
+        @retries = 0
+      end
 
-    def wait
-      if semaphore = proxied_class.find_by(key: key)
-        semaphore.value > 0 && attempt_decrement
-      else
-        attempt_creation
+      def wait
+        if semaphore = Semaphore.find_by(key: key)
+          semaphore.value > 0 && attempt_decrement
+        else
+          attempt_creation
+        end
       end
-    end
 
-    def signal
-      attempt_increment
-    end
+      def signal
+        attempt_increment
+      end
 
-    private
-      attr_reader :job, :proxied_class
+      private
+        attr_reader :job, :retries
 
-      def attempt_creation
-        proxied_class.create!(key: key, value: limit - 1, expires_at: expires_at)
-        true
-      rescue ActiveRecord::RecordNotUnique
-        attempt_decrement
-      end
+        def attempt_creation
+          Semaphore.create!(key: key, value: limit - 1, expires_at: expires_at)
+          true
+        rescue ActiveRecord::RecordNotUnique
+          attempt_decrement
+        end
 
-      def attempt_decrement
-        proxied_class.available.where(key: key).update_all([ "value = value - 1, expires_at = ?", expires_at ]) > 0
-      end
+        def attempt_decrement
+          Semaphore.available.where(key: key).update_all([ "value = value - 1, expires_at = ?", expires_at ]) > 0
+        rescue ActiveRecord::Deadlocked
+          if retriable? then attempt_retry
+          else
+            raise
+          end
+        end
 
-      def attempt_increment
-        proxied_class.where(key: key, value: ...limit).update_all([ "value = value + 1, expires_at = ?", expires_at ]) > 0
-      end
+        def attempt_increment
+          Semaphore.where(key: key, value: ...limit).update_all([ "value = value + 1, expires_at = ?", expires_at ]) > 0
+        end
 
-      def key
-        job.concurrency_key
-      end
+        def attempt_retry
+          self.retries += 1
 
-      def expires_at
-        job.concurrency_duration.from_now
-      end
+          if semaphore = Semaphore.find_by(key: key)
+            semaphore.value > 0 && attempt_decrement
+          end
+        end
 
-      def limit
-        job.concurrency_limit
-      end
+        MAX_RETRIES = 1
+
+        def retriable?
+          retries < MAX_RETRIES
+        end
+
+        def key
+          job.concurrency_key
+        end
+
+        def expires_at
+          job.concurrency_duration.from_now
+        end
+
+        def limit
+          job.concurrency_limit
+        end
+    end
   end
 end
diff --git a/test/integration/concurrency_controls_test.rb b/test/integration/concurrency_controls_test.rb
@@ -42,14 +42,14 @@ class ConcurrencyControlsTest < ActiveSupport::TestCase
     UpdateResultJob.set(wait: 0.2.seconds).perform_later(@result, name: "000", pause: 0.1.seconds)
 
     ("A".."F").each_with_index do |name, i|
-      SequentialUpdateResultJob.set(wait: (0.2 + i * 0.01).seconds).perform_later(@result, name: name, pause: 0.2.seconds)
+      SequentialUpdateResultJob.set(wait: (0.2 + i * 0.01).seconds).perform_later(@result, name: name, pause: 0.3.seconds)
     end
 
     ("G".."K").each_with_index do |name, i|
-      SequentialUpdateResultJob.set(wait: (0.4 + i * 0.01).seconds).perform_later(@result, name: name)
+      SequentialUpdateResultJob.set(wait: (0.3 + i * 0.01).seconds).perform_later(@result, name: name)
     end
 
-    wait_for_jobs_to_finish_for(4.seconds)
+    wait_for_jobs_to_finish_for(5.seconds)
     assert_no_pending_jobs
 
     assert_stored_sequence @result, ("A".."K").to_a