bench: Improve the spectralnorm shootout benchmark #17989

alexcrichton · 2014-10-13T00:34:07Z

This improves the spectralnorm shootout benchmark through a few vectors after
looking at the leading C implementation:

The simd-based f64x2 is now used to parallelize a few computations
RWLock usage has been removed. A custom parallel function was added as a
form of stack-based fork-join parallelism. I found that the contention on the
locks was high as well as hindering other optimizations.

This does, however, introduce one unsafe block into the benchmarks, which
previously had none.

In terms of timings, the before and after numbers are:

$ time ./shootout-spectralnorm-before
./shootout-spectralnorm-before  2.07s user 0.71s system 324% cpu 0.857 total
$ time ./shootout-spectralnorm-before 5500
./shootout-spectralnorm-before 5500  11.88s user 1.13s system 459% cpu 2.830 total
$ time ./shootout-spectralnorm-after
./shootout-spectralnorm-after  0.58s user 0.01s system 280% cpu 0.210 tota
$ time ./shootout-spectralnorm-after 5500
./shootout-spectralnorm-after 5500  3.55s user 0.01s system 455% cpu 0.783 total

rust-highfive · 2014-10-13T00:34:13Z

Warning

These commits modify unsafe code. Please review it carefully!

gereeter · 2014-10-13T02:42:44Z

src/test/bench/shootout-spectralnorm.rs

+// Executes a closure in parallel over the given mutable slice. The closure `f`
+// is run in parallel and yielded the starting index within `v` as well as a
+// sub-slice of `v`.
+fn parallel<T: Send + Sync>(v: &mut [T], mut f: |uint, &mut [T]|: Sync) {


I don't think that this function is safe in general, because f could mutate something it captured. However, it should be possible to encode the correct bound with unboxed closures.

FWIW, I experimented with a similar rewrite that used unboxed closures, and they worked well enough, i.e. no ICEs etc.

Oh dear, you're right! I thought the Sync found was enough, but I think this needs to take an instance of Fn to ensure there's no &mut captures. I've updated with the unboxed closure equivalent of Fn, thanks!

This improves the spectralnorm shootout benchmark through a few vectors after looking at the leading C implementation: * The simd-based f64x2 is now used to parallelize a few computations * RWLock usage has been removed. A custom `parallel` function was added as a form of stack-based fork-join parallelism. I found that the contention on the locks was high as well as hindering other optimizations. This does, however, introduce one `unsafe` block into the benchmarks, which previously had none. In terms of timings, the before and after numbers are: ``` $ time ./shootout-spectralnorm-before ./shootout-spectralnorm-before 2.07s user 0.71s system 324% cpu 0.857 total $ time ./shootout-spectralnorm-before 5500 ./shootout-spectralnorm-before 5500 11.88s user 1.13s system 459% cpu 2.830 total $ time ./shootout-spectralnorm-after ./shootout-spectralnorm-after 0.58s user 0.01s system 280% cpu 0.210 tota $ time ./shootout-spectralnorm-after 5500 ./shootout-spectralnorm-after 5500 3.55s user 0.01s system 455% cpu 0.783 total ```

TeXitoi · 2014-10-16T12:25:08Z

Great! I'd r+ if I can ;)

kud1ing · 2014-10-16T12:28:03Z

See also #18085

This improves the spectralnorm shootout benchmark through a few vectors after looking at the leading C implementation: * The simd-based f64x2 is now used to parallelize a few computations * RWLock usage has been removed. A custom `parallel` function was added as a form of stack-based fork-join parallelism. I found that the contention on the locks was high as well as hindering other optimizations. This does, however, introduce one `unsafe` block into the benchmarks, which previously had none. In terms of timings, the before and after numbers are: ``` $ time ./shootout-spectralnorm-before ./shootout-spectralnorm-before 2.07s user 0.71s system 324% cpu 0.857 total $ time ./shootout-spectralnorm-before 5500 ./shootout-spectralnorm-before 5500 11.88s user 1.13s system 459% cpu 2.830 total $ time ./shootout-spectralnorm-after ./shootout-spectralnorm-after 0.58s user 0.01s system 280% cpu 0.210 tota $ time ./shootout-spectralnorm-after 5500 ./shootout-spectralnorm-after 5500 3.55s user 0.01s system 455% cpu 0.783 total ```

…ykril Provide an option to hide deprecated items from completion Fixes rust-lang#17989. I wonder if this should be instead done in the editor, that will do it in a language-agnostic way. Can't hurt to do it in rust-analyzer, I guess.

gereeter reviewed Oct 13, 2014
View reviewed changes

alexcrichton force-pushed the spectralnorm branch from 783f76d to f7b5470 Compare October 13, 2014 15:53

TeXitoi mentioned this pull request Oct 16, 2014

Implement / optimize the shootout benchmarks #18085

Closed

15 tasks

bors closed this Oct 17, 2014

bors merged commit f7b5470 into rust-lang:master Oct 17, 2014

alexcrichton deleted the spectralnorm branch November 20, 2014 03:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: Improve the spectralnorm shootout benchmark #17989

bench: Improve the spectralnorm shootout benchmark #17989

alexcrichton commented Oct 13, 2014

rust-highfive commented Oct 13, 2014

gereeter Oct 13, 2014

huonw Oct 13, 2014

alexcrichton Oct 13, 2014

TeXitoi commented Oct 16, 2014

kud1ing commented Oct 16, 2014

bench: Improve the spectralnorm shootout benchmark #17989

bench: Improve the spectralnorm shootout benchmark #17989

Conversation

alexcrichton commented Oct 13, 2014

rust-highfive commented Oct 13, 2014

gereeter Oct 13, 2014

Choose a reason for hiding this comment

huonw Oct 13, 2014

Choose a reason for hiding this comment

alexcrichton Oct 13, 2014

Choose a reason for hiding this comment

TeXitoi commented Oct 16, 2014

kud1ing commented Oct 16, 2014