Description
I've been doing a lot of benchmarking recently (1, 2, 3), and I've seen a pretty dramatic drop in performance over the past couple weeks. While some of it might be explained from upgrading from OSX Mavericks to Yosemite, I still saw a 40% drop in performance between 2014-11-13 and 2014-11-23. I haven't been able to dig into what's going on yet, but I did see that our current implementation of Writer
for &mut [u8]
:
impl Writer for Vec<u8> {
#[inline]
fn write(&mut self, buf: &[u8]) -> IoResult<()> {
self.push_all(buf);
Ok(())
}
}
impl Writer for Vec<u8> {
#[inline]
fn write(&mut self, buf: &[u8]) -> IoResult<()> {
self.push_all(buf);
Ok(())
}
}
#[bench]
fn bench_std_vec_writer(b: &mut test::Bencher) {
let mut dst = Vec::with_capacity(BATCHES * SRC_LEN);
let src = &[1, .. SRC_LEN];
b.iter(|| {
dst.clear();
do_std_writes(&mut dst, src, BATCHES);
})
}
Does not appear to be inlining well for some reason:
test writer::bench_std_vec_writer ... bench: 1000 | [----*****#*****--------] | 2000: 1248 ns/iter (+/- 588)
test writer::bench_std_vec_writer_inline_always ... bench: 900 | [----*#***--] | 2000: 1125 ns/iter (+/- 282)
test writer::bench_std_vec_writer_inline_never ... bench: 1000 | [----***#*****--------] | 2000: 1227 ns/iter (+/- 516)
Rewriting to this makes it 10 times faster (and yes, I realize I'm not updating the length of the Vec<u8>
. Could that be a problem?):
struct VecWriter1<'a> {
dst: &'a mut Vec<u8>,
}
impl<'a> MyWriter for VecWriter1<'a> {
#[inline]
fn my_write(&mut self, src: &[u8]) -> IoResult<()> {
let src_len = src.len();
self.dst.reserve(src_len);
let dst = self.dst.as_mut_slice();
unsafe {
// we reserved enough room in `dst` to store `src`.
ptr::copy_nonoverlapping_memory(
dst.as_mut_ptr(),
src.as_ptr(),
src_len);
}
Ok(())
}
}
with this performance:
test writer::bench_vec_writer_1 ... bench: 100 | [------*********#*****--------] | 200: 160 ns/iter (+/- 68)
test writer::bench_vec_writer_1_inline_always ... bench: 100 | [--------****#**--] | 300: 182 ns/iter (+/- 79)
test writer::bench_vec_writer_1_inline_never ... bench: 600 | [---****#**--] | 2000: 952 ns/iter (+/- 399)
Furthermore, commenting out the self.dst.reserve(src_len)
made it just as fast as BufWriter
and directly using the unsafe ptr::copy_nonoverlapping_memory
.