-
Notifications
You must be signed in to change notification settings - Fork 13.3k
rand: inform the optimiser that indexing is never out-of-bounds. #16965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(Since this is somewhat crypto-related, I've been liberal with comments.) |
I see: Before:
After:
And not modifying the field type, but just using
|
i.e. the diff for the last bench is just: diff --git a/src/librand/isaac.rs b/src/librand/isaac.rs
index 0f7cda4..d80999e 100644
--- a/src/librand/isaac.rs
+++ b/src/librand/isaac.rs
@@ -185,7 +185,7 @@ impl Rng for IsaacRng {
self.isaac();
}
self.cnt -= 1;
- self.rsl[self.cnt as uint]
+ self.rsl[self.cnt as u8 as uint]
}
}
@@ -416,7 +416,7 @@ impl Rng for Isaac64Rng {
self.isaac64();
}
self.cnt -= 1;
- unsafe { *self.rsl.unsafe_get(self.cnt) }
+ self.rsl[self.cnt as u8 as uint]
}
} |
Hm, that's interesting. That may be a good approach, although it fails to generalise if the RNG state size is increased. I wonder if just |
This uses a bitwise mask to ensure that there's no bounds checking for the array accesses when generating the next random number. This isn't costless, but the single instruction is nothing compared to the branch. A `debug_assert` for "bounds check" is preserved to ensure that refactoring doesn't accidentally break it (i.e. create values of `cnt` that are out of bounds with the masking causing it to silently wrap- around). Before: test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s After: test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s (It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain* in performance; today is a good day.)
Thanks for the suggestion @dotdash, I've switched to a 'safer' version (i.e. less chance for mistakes to be silently ignored) which is, AFAICT, equally as fast, even in a tight loop. |
r? |
rand: inform the optimiser that indexing is never out-of-bounds. This uses a bitwise mask to ensure that there's no bounds checking for the array accesses when generating the next random number. This isn't costless, but the single instruction is nothing compared to the branch. A `debug_assert` for "bounds check" is preserved to ensure that refactoring doesn't accidentally break it (i.e. create values of `cnt` that are out of bounds with the masking causing it to silently wrap- around). Before: test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s After: test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s (It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain* in performance; today is a good day.)
fix: use lldb when debugging with C++ extension on MacOS See rust-lang/rust-analyzer#16901 (comment) This PR resolves the issue of being unable to debug using the C++ extension on macOS. By using special configurations for the `MIMode` on macOS, it enables the C++ extension to connect to lldb when debugging (without affecting other platforms).
rand: inform the optimiser that indexing is never out-of-bounds.
This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.
A
debug_assert
for "bounds check" is preserved to ensure thatrefactoring doesn't accidentally break it (i.e. create values of
cnt
that are out of bounds with the masking causing it to silently wrap-
around).
Before:
test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s
test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s
After:
test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s
test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s
(It also removes the unsafe code in Isaac64Rng.next_u64, with a gain
in performance; today is a good day.)