Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941

reem · 2014-11-14T02:57:40Z

The previous implementation was very likely to cause panics during
unwinding through this process:

child panics, drops its receiver
taskpool comes back around and sends another job over to that child
the child receiver has hung up, so the taskpool panics on send
during unwinding, the taskpool attempts to send a quit message to
the child, causing a panic during unwinding
panic during unwinding causes a process abort

This meant that TaskPool upgraded any child panic to a full process
abort. This came up in Iron when it caused crashes in long-running
servers.

This implementation uses a single channel to communicate between
spawned tasks and the TaskPool, which significantly reduces the complexity
of the implementation and cuts down on allocation. The TaskPool uses
the channel as a single-producer-multiple-consumer queue.

Additionally, through the use of send_opt and recv_opt instead of
send and recv, this TaskPool is robust on the face of child panics,
both before, during, and after the TaskPool itself is dropped.

This TaskPool uses an additional "monitor" task start new child
tasks to replace those that panic.

Due to the TaskPool no longer using an init_fn_factory, this is a

[breaking-change]

otherwise, the API has not changed.

If you used init_fn_factory in your code, and this change breaks for
you, you can instead use an AtomicUint counter and a channel to
move information into child tasks.

reem · 2014-11-14T02:59:41Z

This already exists as a library here: https://github.com/reem/rust-resistant-taskpool

r? @huonw

huonw · 2014-11-14T03:39:28Z

src/libstd/sync/task_pool.rs

-    channels: Vec<Sender<Msg<T>>>,
-    next_index: uint,
+///
+/// Spawns n + 1 tasks and respawns tasks on subtask panics.


Maybe

Spawns `n` worker tasks and one additional supervisor to replenish the pool after any panics.

huonw · 2014-11-14T03:49:57Z

I like it, although I'd like a second opinion, e.g. @alexcrichton.

alexcrichton · 2014-11-14T06:29:08Z

This looks great, thanks @reem! Would it be possible to not have a monitoring task at all? Could each Sentinel detect failure via task::failing() and if so it spawns a new thread with another handle to the mutex?

reem · 2014-11-14T06:33:55Z

Great idea! I was trying to factor the monitor out, but didn't consider having the Sentinel be the one to actually spawn a new task.

reem · 2014-11-14T06:49:18Z

I refactored all of task-re-spawning logic into Sentinel, and it really simplifies the code along with eliminating the monitor task.

alexcrichton · 2014-11-14T06:50:15Z

src/libstd/sync/task_pool.rs

+
+    // Cancel and destroy this sentinel.
+    fn cancel(mut self) {
+        self.active = false;


Oh nice idea, I like this better than task::failing()

alexcrichton · 2014-11-14T06:53:27Z

r=me with @huonw's comment

The previous implementation was very likely to cause panics during unwinding through this process: - child panics, drops its receiver - taskpool comes back around and sends another job over to that child - the child receiver has hung up, so the taskpool panics on send - during unwinding, the taskpool attempts to send a quit message to the child, causing a panic during unwinding - panic during unwinding causes a process abort This meant that TaskPool upgraded any child panic to a full process abort. This came up in Iron when it caused crashes in long-running servers. This implementation uses a single channel to communicate between spawned tasks and the TaskPool, which significantly reduces the complexity of the implementation and cuts down on allocation. The TaskPool uses the channel as a single-producer-multiple-consumer queue. Additionally, through the use of send_opt and recv_opt instead of send and recv, this TaskPool is robust on the face of child panics, both before, during, and after the TaskPool itself is dropped. Due to the TaskPool no longer using an `init_fn_factory`, this is a [breaking-change] otherwise, the API has not changed. If you used `init_fn_factory` in your code, and this change breaks for you, you can instead use an `AtomicUint` counter and a channel to move information into child tasks.

barosl · 2014-11-14T08:25:03Z

Cool thing! This fixes #18836. I'm closing it now.

huonw reviewed Nov 14, 2014
View reviewed changes

reem force-pushed the better-task-pool branch from 7efda9c to b691914 Compare November 14, 2014 05:12

reem force-pushed the better-task-pool branch from b691914 to 1252288 Compare November 14, 2014 06:46

alexcrichton reviewed Nov 14, 2014
View reviewed changes

reem force-pushed the better-task-pool branch from 1252288 to d602e7a Compare November 14, 2014 06:55

reem force-pushed the better-task-pool branch from d602e7a to 93c4942 Compare November 14, 2014 06:57

barosl mentioned this pull request Nov 14, 2014

TaskPool tries to reuse the panicked task, thus dies #18836

Closed

bors merged commit 93c4942 into rust-lang:master Nov 16, 2014

reem deleted the better-task-pool branch November 16, 2014 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941

Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941

reem commented Nov 14, 2014

reem commented Nov 14, 2014

huonw Nov 14, 2014

huonw commented Nov 14, 2014

alexcrichton commented Nov 14, 2014

reem commented Nov 14, 2014

reem commented Nov 14, 2014

alexcrichton Nov 14, 2014

alexcrichton commented Nov 14, 2014

barosl commented Nov 14, 2014

Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941

Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941

Conversation

reem commented Nov 14, 2014

reem commented Nov 14, 2014

huonw Nov 14, 2014

Choose a reason for hiding this comment

huonw commented Nov 14, 2014

alexcrichton commented Nov 14, 2014

reem commented Nov 14, 2014

reem commented Nov 14, 2014

alexcrichton Nov 14, 2014

Choose a reason for hiding this comment

alexcrichton commented Nov 14, 2014

barosl commented Nov 14, 2014