-
Notifications
You must be signed in to change notification settings - Fork 13.3k
bench: Improve the spectralnorm shootout benchmark #17989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// Executes a closure in parallel over the given mutable slice. The closure `f` | ||
// is run in parallel and yielded the starting index within `v` as well as a | ||
// sub-slice of `v`. | ||
fn parallel<T: Send + Sync>(v: &mut [T], mut f: |uint, &mut [T]|: Sync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this function is safe in general, because f
could mutate something it captured. However, it should be possible to encode the correct bound with unboxed closures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I experimented with a similar rewrite that used unboxed closures, and they worked well enough, i.e. no ICEs etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear, you're right! I thought the Sync
found was enough, but I think this needs to take an instance of Fn
to ensure there's no &mut
captures. I've updated with the unboxed closure equivalent of Fn
, thanks!
This improves the spectralnorm shootout benchmark through a few vectors after looking at the leading C implementation: * The simd-based f64x2 is now used to parallelize a few computations * RWLock usage has been removed. A custom `parallel` function was added as a form of stack-based fork-join parallelism. I found that the contention on the locks was high as well as hindering other optimizations. This does, however, introduce one `unsafe` block into the benchmarks, which previously had none. In terms of timings, the before and after numbers are: ``` $ time ./shootout-spectralnorm-before ./shootout-spectralnorm-before 2.07s user 0.71s system 324% cpu 0.857 total $ time ./shootout-spectralnorm-before 5500 ./shootout-spectralnorm-before 5500 11.88s user 1.13s system 459% cpu 2.830 total $ time ./shootout-spectralnorm-after ./shootout-spectralnorm-after 0.58s user 0.01s system 280% cpu 0.210 tota $ time ./shootout-spectralnorm-after 5500 ./shootout-spectralnorm-after 5500 3.55s user 0.01s system 455% cpu 0.783 total ```
783f76d
to
f7b5470
Compare
Great! I'd r+ if I can ;) |
See also #18085 |
This improves the spectralnorm shootout benchmark through a few vectors after looking at the leading C implementation: * The simd-based f64x2 is now used to parallelize a few computations * RWLock usage has been removed. A custom `parallel` function was added as a form of stack-based fork-join parallelism. I found that the contention on the locks was high as well as hindering other optimizations. This does, however, introduce one `unsafe` block into the benchmarks, which previously had none. In terms of timings, the before and after numbers are: ``` $ time ./shootout-spectralnorm-before ./shootout-spectralnorm-before 2.07s user 0.71s system 324% cpu 0.857 total $ time ./shootout-spectralnorm-before 5500 ./shootout-spectralnorm-before 5500 11.88s user 1.13s system 459% cpu 2.830 total $ time ./shootout-spectralnorm-after ./shootout-spectralnorm-after 0.58s user 0.01s system 280% cpu 0.210 tota $ time ./shootout-spectralnorm-after 5500 ./shootout-spectralnorm-after 5500 3.55s user 0.01s system 455% cpu 0.783 total ```
…ykril Provide an option to hide deprecated items from completion Fixes rust-lang#17989. I wonder if this should be instead done in the editor, that will do it in a language-agnostic way. Can't hurt to do it in rust-analyzer, I guess.
This improves the spectralnorm shootout benchmark through a few vectors after
looking at the leading C implementation:
parallel
function was added as aform of stack-based fork-join parallelism. I found that the contention on the
locks was high as well as hindering other optimizations.
This does, however, introduce one
unsafe
block into the benchmarks, whichpreviously had none.
In terms of timings, the before and after numbers are: