Open
Description
Issue by brycelelbach
Monday Sep 27, 2021 at 21:45 GMT
Originally opened as NVIDIA/stdexec#195
This should be split into multiple issues:
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#example-async-inclusive-scan If you want this to be a valid implementation of inclusive-scan we need to fix http://eel.is/c++draft/algorithms.parallel.exceptions to stop special casing whether the user throws bad alloc or we do; under the current rules it would require an implementation to make a duplicate copy of all the execution machinery to propagate a different exception type to distinguish between the algorithm's failure to allocate and user std::bad_allocs. (For example we do that by replacing vector's allocator in our implementation today https://github.com/microsoft/STL/blob/dc888f7d9fb7a4db8d3441f9e9bac2e0c6ecc4db/stl/inc/execution#L282-L313 ) . Considering [algorithms.parallel.exceptions] was supposedly written to make things easier for implementers, I would prefer to not change the example there, and instead fix [algorithms.parallel.exceptions] to say that execptions occurring on calling thread(s) may be propagated to the calling thread(s) and exceptions emitted on threads created by the implementation go to terminate go directly to terminate ...
- I also observe that that implementation does the terrible "barrier" inclusive_scan algorithm; is something like https://research.nvidia.com/sites/default/files/pubs/2016-03_Single-pass-Parallel-Prefix/nvr-2016-002.pdf implementable in this universe? (MSVC's, and presumably Thrust's inclusive_scan use that algorithm)
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#example-async-dynamically-sized-read It would be neat if what happens for errors was described here. While that may distract from the async concepts trying to be described, how to bail out correctly on failure is an important consideration for async things. (And indeed, cancellation being difficult is one of the primary criticisms against the ASIO design proposed for standardization right now) Just replacing the asserts with what a real implementation would do instead may be sufficient?
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#design-transitions RE: "running user code anywhere but where they defined it to run must be considered a bug" there are always iterator copies etc. that likely happen when setting up the parallel algorithm calls. Be very careful about specifying what "user code" means here. The thing that made [algorithms.parallel.exceptions] so frustrating to implement is that things people don't expect to throw, like copying an iterator, can nonetheless throw.
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#design-fpg I am so happy to see forward progress guarantees being discussed 😄
- RE: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#design-sender-adaptor-ensure_started & co.: I'm concerned that this creates new avenues of creating "detached" work, which seems like an "always [basic.start.term]/6 timebomb". I note that the compromise that added thread::detach() also added Xxx_at_thread_exit as a hypothetical way to restore join-like semantics to detached work. but there appears to be no such help here. Of note, transferring work off an execution agent does not mean that the execution agent is actually dead, but [basic.start.term]/6 (and real implementations) require that the execution agent is actually dead.
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#design-dispatch I suspect I am just drowned out here but I really hate the escalating war that we have declared on ADL and the problems we've created for ourselves resulting like this. I note that the reason tag_invoke died in LEWGi in San Diego was due to adding an additional layer of std::forward calls needed for everything; it is unfortunate in my view that that concern is unaddressed.
- Quasi-related to this paper, we talk about "execution context" in lots of places and I see SG1 people seem to know exactly what that means; I still do not and suspect I am not alone. I note that what this paper thinks that is and what the networking TS thinks that is are not the same.
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#spec-execution.op_state.start Are there any requirements on what happens for connect relating to the started work? I'm asking because we would want to use this to implement the "prepare for parallelism / commit to parallelism / on failure fall back to serial" behavior, but if connect() can start the work we're toast. Or is that always going to be under control of the standard library for purposes of the existing parallel algorithms? Example https://github.com/microsoft/STL/blob/dc888f7d9fb7a4db8d3441f9e9bac2e0c6ecc4db/stl/inc/execution#L1189-L1233 -- everything up to line 1206 is partitioning the work etc. which may require allocation memory or threads or similar, and if any of that fails we throw _Parallelism_resources_exhausted, and fall down to serial. Of note, that design only works if none of the for_each invocations have run yet
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#spec-execution.senders.adaptors.bulk It would be good to note here that "shape" is "number of partitions". Perhaps that should be changed to be partition_count and it can be renamed back to shape should the hypothetical non-integral shapes present themselves.
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2300r1.html#spec-execution.senders.consumers.sync_wait needs to "block with forward progress delegation" in several circumstances to implement the existing parallel algorithms library if we want it to.