-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Rewrite std::sync::TaskPool to be load balancing and panic-resistant #18941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This already exists as a library here: https://github.com/reem/rust-resistant-taskpool r? @huonw |
channels: Vec<Sender<Msg<T>>>, | ||
next_index: uint, | ||
/// | ||
/// Spawns n + 1 tasks and respawns tasks on subtask panics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
Spawns `n` worker tasks and one additional supervisor to replenish the pool after any panics.
I like it, although I'd like a second opinion, e.g. @alexcrichton. |
7efda9c
to
b691914
Compare
This looks great, thanks @reem! Would it be possible to not have a monitoring task at all? Could each |
Great idea! I was trying to factor the monitor out, but didn't consider having the Sentinel be the one to actually spawn a new task. |
b691914
to
1252288
Compare
I refactored all of task-re-spawning logic into Sentinel, and it really simplifies the code along with eliminating the monitor task. |
|
||
// Cancel and destroy this sentinel. | ||
fn cancel(mut self) { | ||
self.active = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice idea, I like this better than task::failing()
r=me with @huonw's comment |
1252288
to
d602e7a
Compare
The previous implementation was very likely to cause panics during unwinding through this process: - child panics, drops its receiver - taskpool comes back around and sends another job over to that child - the child receiver has hung up, so the taskpool panics on send - during unwinding, the taskpool attempts to send a quit message to the child, causing a panic during unwinding - panic during unwinding causes a process abort This meant that TaskPool upgraded any child panic to a full process abort. This came up in Iron when it caused crashes in long-running servers. This implementation uses a single channel to communicate between spawned tasks and the TaskPool, which significantly reduces the complexity of the implementation and cuts down on allocation. The TaskPool uses the channel as a single-producer-multiple-consumer queue. Additionally, through the use of send_opt and recv_opt instead of send and recv, this TaskPool is robust on the face of child panics, both before, during, and after the TaskPool itself is dropped. Due to the TaskPool no longer using an `init_fn_factory`, this is a [breaking-change] otherwise, the API has not changed. If you used `init_fn_factory` in your code, and this change breaks for you, you can instead use an `AtomicUint` counter and a channel to move information into child tasks.
d602e7a
to
93c4942
Compare
Cool thing! This fixes #18836. I'm closing it now. |
The previous implementation was very likely to cause panics during
unwinding through this process:
the child, causing a panic during unwinding
This meant that TaskPool upgraded any child panic to a full process
abort. This came up in Iron when it caused crashes in long-running
servers.
This implementation uses a single channel to communicate between
spawned tasks and the TaskPool, which significantly reduces the complexity
of the implementation and cuts down on allocation. The TaskPool uses
the channel as a single-producer-multiple-consumer queue.
Additionally, through the use of send_opt and recv_opt instead of
send and recv, this TaskPool is robust on the face of child panics,
both before, during, and after the TaskPool itself is dropped.
This TaskPool uses an additional "monitor" task start new child
tasks to replace those that panic.
Due to the TaskPool no longer using an
init_fn_factory
, this is a[breaking-change]
otherwise, the API has not changed.
If you used
init_fn_factory
in your code, and this change breaks foryou, you can instead use an
AtomicUint
counter and a channel tomove information into child tasks.