rand: inform the optimiser that indexing is never out-of-bounds. #16965

huonw · 2014-09-03T13:26:47Z

rand: inform the optimiser that indexing is never out-of-bounds.

This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.

A debug_assert for "bounds check" is preserved to ensure that
refactoring doesn't accidentally break it (i.e. create values of cnt
that are out of bounds with the masking causing it to silently wrap-
around).

Before:

test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s
test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s

After:

test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s
test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s

(It also removes the unsafe code in Isaac64Rng.next_u64, with a gain
in performance; today is a good day.)

huonw · 2014-09-03T13:27:28Z

(Since this is somewhat crypto-related, I've been liberal with comments.)

dotdash · 2014-09-03T14:41:25Z

I see:

Before:

rand_isaac             ... bench:       715 ns/iter (+/- 61) = 1118 MB/s
rand_isaac64           ... bench:       356 ns/iter (+/- 15) = 2247 MB/s

After:

rand_isaac             ... bench:       616 ns/iter (+/- 35) = 1298 MB/s
rand_isaac64           ... bench:       362 ns/iter (+/- 6) = 2209 MB/s

And not modifying the field type, but just using self.rsl[self.cnt as u8 as uint] in both next_u*() methods gives:

rand_isaac             ... bench:       616 ns/iter (+/- 1) = 1298 MB/s
rand_isaac64           ... bench:       348 ns/iter (+/- 4) = 2298 MB/s

dotdash · 2014-09-03T14:42:01Z

i.e. the diff for the last bench is just:

diff --git a/src/librand/isaac.rs b/src/librand/isaac.rs
index 0f7cda4..d80999e 100644
--- a/src/librand/isaac.rs
+++ b/src/librand/isaac.rs
@@ -185,7 +185,7 @@ impl Rng for IsaacRng {
             self.isaac();
         }
         self.cnt -= 1;
-        self.rsl[self.cnt as uint]
+        self.rsl[self.cnt as u8 as uint]
     }
 }

@@ -416,7 +416,7 @@ impl Rng for Isaac64Rng {
             self.isaac64();
         }
         self.cnt -= 1;
-        unsafe { *self.rsl.unsafe_get(self.cnt) }
+        self.rsl[self.cnt as u8 as uint]
     }
 }

huonw · 2014-09-04T09:14:49Z

Hm, that's interesting. That may be a good approach, although it fails to generalise if the RNG state size is increased. I wonder if just self.cnt & (RAND_SIZE - 1) works... /me tries

This uses a bitwise mask to ensure that there's no bounds checking for the array accesses when generating the next random number. This isn't costless, but the single instruction is nothing compared to the branch. A `debug_assert` for "bounds check" is preserved to ensure that refactoring doesn't accidentally break it (i.e. create values of `cnt` that are out of bounds with the masking causing it to silently wrap- around). Before: test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s After: test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s (It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain* in performance; today is a good day.)

huonw · 2014-09-08T09:43:23Z

Thanks for the suggestion @dotdash, I've switched to a 'safer' version (i.e. less chance for mistakes to be silently ignored) which is, AFAICT, equally as fast, even in a tight loop.

huonw · 2014-09-08T09:44:39Z

r?

rand: inform the optimiser that indexing is never out-of-bounds. This uses a bitwise mask to ensure that there's no bounds checking for the array accesses when generating the next random number. This isn't costless, but the single instruction is nothing compared to the branch. A `debug_assert` for "bounds check" is preserved to ensure that refactoring doesn't accidentally break it (i.e. create values of `cnt` that are out of bounds with the masking causing it to silently wrap- around). Before: test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s After: test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s (It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain* in performance; today is a good day.)

fix: use lldb when debugging with C++ extension on MacOS See rust-lang/rust-analyzer#16901 (comment) This PR resolves the issue of being unable to debug using the C++ extension on macOS. By using special configurations for the `MIMode` on macOS, it enables the C++ extension to connect to lldb when debugging (without affecting other platforms).

huonw force-pushed the isaac-oob-- branch from 247e298 to cc6a487 Compare September 8, 2014 09:40

huonw changed the title ~~rand: remove bounds checks in Isaac*Rng.next_* via good type choice.~~ rand: inform the optimiser that indexing is never out-of-bounds. Sep 8, 2014

bors closed this Sep 9, 2014

bors merged commit cc6a487 into rust-lang:master Sep 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rand: inform the optimiser that indexing is never out-of-bounds. #16965

rand: inform the optimiser that indexing is never out-of-bounds. #16965

huonw commented Sep 3, 2014

huonw commented Sep 3, 2014

dotdash commented Sep 3, 2014

dotdash commented Sep 3, 2014

huonw commented Sep 4, 2014

huonw commented Sep 8, 2014

huonw commented Sep 8, 2014

rand: inform the optimiser that indexing is never out-of-bounds. #16965

rand: inform the optimiser that indexing is never out-of-bounds. #16965

Conversation

huonw commented Sep 3, 2014

huonw commented Sep 3, 2014

dotdash commented Sep 3, 2014

dotdash commented Sep 3, 2014

huonw commented Sep 4, 2014

huonw commented Sep 8, 2014

huonw commented Sep 8, 2014