Skip to content

when_all/split: operation state might get prematurely destroyed when child completes synchronously inside its stop callback #300

Open
@jiixyj

Description

@jiixyj

I may have stumbled across a nasty lifetime issue in the handling of stop callbacks in the when_all/split algorithms. But this would apply to any algorithm using a inplace_stop_source inside its operation state.

when_all has a inplace_stop_source inside its operation state, and then a stop callback, like this:

struct when_all_opstate {
    // ...
    inplace_stop_source stop_src{};
    // ...
    optional<stop_callback> on_stop{nullopt};
    // ...
};

In on_stop, a stop callback is registered, which calls stop_src.request_stop() when there is a stop request on the receiver's stop token. All child senders of the when_all are registered on the stop_src. This propagates the stop request from the receiver to all children of the when_all sender.

Now, what I observed is the following chain of events:

  • receiver's stop token triggers
  • on_stop callback calls stop_src.stop_requested()
  • the stop_src now iterates over its list of registered stop callbacks (those are the ones from the children) (*)
  • the last child calls set_stopped synchronously from inside its stop callback. In my case it's from deregistering a sleep timer from an self-written epoll-based "io context", something like this:
                struct stop_cb {
                    opstate* op;
                    void operator()() noexcept
                    {
                        if (op->loop_->deregister_deadline(op)) {
                            op->set_stopped();
                        } else {
                            // lost the race, main loop will call `set_value()`
                        }
                    }
                };
  • now, arrive() of the when_all opstate gets called
  • complete() of the when_all opstate gets called
  • ex::set_stopped is called, satisfying the receiver contract of the when_all sender
  • the when_all opstate is synchronously (!) destroyed from inside the set_stopped callback by some follow up work.
    -> UB, since we are still iterating over the list of registered stop callbacks of stop_src! (the line marked with "*" above)

I've managed to "hack around" it by doing some checking of thread id's and deferring the completion to the stop callback if I detect that the completion is called synchronously from inside a stop callback. So something like this:

template <typename Opstate>
struct on_stop_request_with_thread_id {
public:
    void operator()() const noexcept
    {
        id_->store(std::this_thread::get_id(), std::memory_order::relaxed);
        op_->request_stop();
        if (id_->load(std::memory_order::relaxed) == std::thread::id{}) {
            op_->deferred_complete();
            return;
        }
        id_->store(std::thread::id{}, std::memory_order::relaxed);
    }

public: // NOLINT enable aggregate initialization
    Opstate* op_;
    std::atomic<std::thread::id>* id_;
};

export template <typename Opstate, typename Token>
struct stop_callback_with_thread_id {
public:
    bool should_defer_completion()
    {
        if (id_.load(std::memory_order::relaxed) != std::this_thread::get_id()) {
            return false;
        }

        id_.store(std::thread::id{}, std::memory_order::relaxed);
        return true;
    }

    void reset() { on_stop_.reset(); }

    void emplace(Token token, Opstate* op)
    {
        on_stop_.emplace(std::move(token), on_stop_request_with_thread_id<Opstate>{op, &id_});
    }

private:
    using stop_callback_t = ex::stop_callback_for_t<Token, on_stop_request_with_thread_id<Opstate>>;

    std::atomic<std::thread::id> id_{};
    std::optional<stop_callback_t> on_stop_ = {};
};

...and then using stop_callback_with_thread_id instead of optional<stop_callback> inside when_all's opstate, and having a complete() like this:

            void complete(Rcvr& rcvr) noexcept
            {
                // If we are completing synchronously from inside a stop
                // callback, defer completion.
                if (on_stop.should_defer_completion()) {
                    this->deferred_rcvr = &rcvr;
                    return;
                }

                complete_impl(rcvr);
            }

This all feels very hacky to me, though.

I haven't deeply investigated split yet, but I think the solution could be a bit simpler there, by using a stop callback like this instead of on-stop-request:

template <typename SharedState>
struct split_on_stop_request {
    void operator()() const noexcept
    {
        shared_state_->inc_ref();
        shared_state_->stop_src.request_stop();
        shared_state_->dec_ref();
    }

    SharedState* shared_state_;
};

...i.e. just wrapping the request_stop() between inc_ref/dec_ref to ensure the opstate object stays alive long enough.

I do wonder if there is a more elegant way to solve this issue. I don't think synchronous completions from stop callbacks should be outlawed -- it seems "natural" to me to do the set_stopped right inside the stop callback if possible. Or maybe synchronous destruction from inside the set_stopped completion of when_all is the problem? I've thought that you have to assume the lifetime of the opstate may end when calling the completion, though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-proposed-resolutionThis issue does not yet have a proposed resolution but needs one

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions