Skip to content

rand: inform the optimiser that indexing is never out-of-bounds. #16965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2014

Conversation

huonw
Copy link
Member

@huonw huonw commented Sep 3, 2014

rand: inform the optimiser that indexing is never out-of-bounds.

This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.

A debug_assert for "bounds check" is preserved to ensure that
refactoring doesn't accidentally break it (i.e. create values of cnt
that are out of bounds with the masking causing it to silently wrap-
around).

Before:

test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s
test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s

After:

test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s
test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s

(It also removes the unsafe code in Isaac64Rng.next_u64, with a gain
in performance; today is a good day.)

@huonw
Copy link
Member Author

huonw commented Sep 3, 2014

(Since this is somewhat crypto-related, I've been liberal with comments.)

@dotdash
Copy link
Contributor

dotdash commented Sep 3, 2014

I see:

Before:

rand_isaac             ... bench:       715 ns/iter (+/- 61) = 1118 MB/s
rand_isaac64           ... bench:       356 ns/iter (+/- 15) = 2247 MB/s

After:

rand_isaac             ... bench:       616 ns/iter (+/- 35) = 1298 MB/s
rand_isaac64           ... bench:       362 ns/iter (+/- 6) = 2209 MB/s

And not modifying the field type, but just using self.rsl[self.cnt as u8 as uint] in both next_u*() methods gives:

rand_isaac             ... bench:       616 ns/iter (+/- 1) = 1298 MB/s
rand_isaac64           ... bench:       348 ns/iter (+/- 4) = 2298 MB/s

@dotdash
Copy link
Contributor

dotdash commented Sep 3, 2014

i.e. the diff for the last bench is just:

diff --git a/src/librand/isaac.rs b/src/librand/isaac.rs
index 0f7cda4..d80999e 100644
--- a/src/librand/isaac.rs
+++ b/src/librand/isaac.rs
@@ -185,7 +185,7 @@ impl Rng for IsaacRng {
             self.isaac();
         }
         self.cnt -= 1;
-        self.rsl[self.cnt as uint]
+        self.rsl[self.cnt as u8 as uint]
     }
 }

@@ -416,7 +416,7 @@ impl Rng for Isaac64Rng {
             self.isaac64();
         }
         self.cnt -= 1;
-        unsafe { *self.rsl.unsafe_get(self.cnt) }
+        self.rsl[self.cnt as u8 as uint]
     }
 }

@huonw
Copy link
Member Author

huonw commented Sep 4, 2014

Hm, that's interesting. That may be a good approach, although it fails to generalise if the RNG state size is increased. I wonder if just self.cnt & (RAND_SIZE - 1) works... /me tries

This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.

A `debug_assert` for "bounds check" is preserved to ensure that
refactoring doesn't accidentally break it (i.e. create values of `cnt`
that are out of bounds with the masking causing it to silently wrap-
around).

Before:

    test test::rand_isaac   ... bench: 990 ns/iter (+/- 24) = 808 MB/s
    test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s

After:

    test test::rand_isaac   ... bench: 877 ns/iter (+/- 134) = 912 MB/s
    test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s

(It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain*
in performance; today is a good day.)
@huonw huonw changed the title rand: remove bounds checks in Isaac*Rng.next_* via good type choice. rand: inform the optimiser that indexing is never out-of-bounds. Sep 8, 2014
@huonw
Copy link
Member Author

huonw commented Sep 8, 2014

Thanks for the suggestion @dotdash, I've switched to a 'safer' version (i.e. less chance for mistakes to be silently ignored) which is, AFAICT, equally as fast, even in a tight loop.

@huonw
Copy link
Member Author

huonw commented Sep 8, 2014

r?

bors added a commit that referenced this pull request Sep 9, 2014
rand: inform the optimiser that indexing is never out-of-bounds.

This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.

A `debug_assert` for "bounds check" is preserved to ensure that
refactoring doesn't accidentally break it (i.e. create values of `cnt`
that are out of bounds with the masking causing it to silently wrap-
around).

Before:

test test::rand_isaac   ... bench: 990 ns/iter (+/- 24) = 808 MB/s
test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s

After:

test test::rand_isaac   ... bench: 877 ns/iter (+/- 134) = 912 MB/s
test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s

(It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain*
in performance; today is a good day.)
@bors bors closed this Sep 9, 2014
@bors bors merged commit cc6a487 into rust-lang:master Sep 9, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 31, 2024
fix: use lldb when debugging with C++ extension on MacOS

See rust-lang/rust-analyzer#16901 (comment)

This PR resolves the issue of being unable to debug using the C++ extension on macOS. By using special configurations for the `MIMode` on macOS, it enables the C++ extension to connect to lldb when debugging (without affecting other platforms).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants