You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
auto merge of #11280 : c-a/rust/inline_byteswap, r=brson
After writing some benchmarks for ebml::reader::vuint_at() I noticed that LLVM doesn't seem to inline the from_be32 function even though it only does a call to the bswap32 intrinsic in the x86_64 case. Marking the functions with #[inline(always)] fixes that and seems to me a reasonable thing to do. I got the following measurements in my vuint_at() benchmarks:
- Before
test ebml::bench::vuint_at_A_aligned ... bench: 1075 ns/iter (+/- 58)
test ebml::bench::vuint_at_A_unaligned ... bench: 1073 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_aligned ... bench: 1150 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_unaligned ... bench: 1151 ns/iter (+/- 6)
- Inline from_be32
test ebml::bench::vuint_at_A_aligned ... bench: 769 ns/iter (+/- 9)
test ebml::bench::vuint_at_A_unaligned ... bench: 795 ns/iter (+/- 6)
test ebml::bench::vuint_at_D_aligned ... bench: 758 ns/iter (+/- 8)
test ebml::bench::vuint_at_D_unaligned ... bench: 759 ns/iter (+/- 8)
- Using vuint_at_slow()
test ebml::bench::vuint_at_A_aligned ... bench: 646 ns/iter (+/- 7)
test ebml::bench::vuint_at_A_unaligned ... bench: 645 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_aligned ... bench: 907 ns/iter (+/- 4)
test ebml::bench::vuint_at_D_unaligned ... bench: 1085 ns/iter (+/- 16)
As expected inlining from_be32() gave a considerable speedup.
I also tried how the "slow" version fared against the optimized version and noticed that it's
actually a bit faster for small A class integers (using only two bytes) but slower for big D class integers (using four bytes)
0 commit comments