Dramatic slowdown in rust performance from the serialization benchmarks

I've been doing a lot of benchmarking recently ([1](http://erickt.github.io/blog/2014/11/11/benchmarks/), [2](http://erickt.github.io/blog/2014/11/13/benchmarks-2/), [3](http://erickt.github.io/blog/2014/11/22/benchmarking-is-confusing/)), and I've seen a pretty dramatic drop in performance over the past couple weeks. While some of it might be explained from upgrading from OSX Mavericks to Yosemite, I still saw a 40% drop in performance between 2014-11-13 and 2014-11-23. I haven't been able to dig into what's going on yet, but I did see that our current implementation of `Writer` for `&mut [u8]`:

``` rust
impl Writer for Vec<u8> {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        self.push_all(buf);
        Ok(())
    }
}

impl Writer for Vec<u8> {
    #[inline]
    fn write(&mut self, buf: &[u8]) -> IoResult<()> {
        self.push_all(buf);
        Ok(())
    }
}

#[bench]
fn bench_std_vec_writer(b: &mut test::Bencher) {
    let mut dst = Vec::with_capacity(BATCHES * SRC_LEN);
    let src = &[1, .. SRC_LEN];

    b.iter(|| {
        dst.clear();

        do_std_writes(&mut dst, src, BATCHES);
    })
}
```

Does not appear to be inlining well for some reason:

```
test writer::bench_std_vec_writer                           ... bench: 1000 | [----*****#*****--------]             | 2000:      1248 ns/iter (+/- 588)
test writer::bench_std_vec_writer_inline_always             ... bench: 900 |   [----*#***--]                        | 2000:      1125 ns/iter (+/- 282)
test writer::bench_std_vec_writer_inline_never              ... bench: 1000 |  [----***#*****--------]              | 2000:      1227 ns/iter (+/- 516)
```

Rewriting to this makes it 10 times faster (and yes, I realize I'm not updating the length of the `Vec<u8>`. Could that be a problem?):

``` rust
struct VecWriter1<'a> {
    dst: &'a mut Vec<u8>,
}

impl<'a> MyWriter for VecWriter1<'a> {
    #[inline]
    fn my_write(&mut self, src: &[u8]) -> IoResult<()> {
        let src_len = src.len();

        self.dst.reserve(src_len);

        let dst = self.dst.as_mut_slice();

        unsafe {
            // we reserved enough room in `dst` to store `src`.
            ptr::copy_nonoverlapping_memory(
                dst.as_mut_ptr(),
                src.as_ptr(),
                src_len);
        }

        Ok(())
    }
}
```

with this performance:

```
test writer::bench_vec_writer_1                             ... bench: 100 |         [------*********#*****--------] | 200:       160 ns/iter (+/- 68)
test writer::bench_vec_writer_1_inline_always               ... bench: 100 |     [--------****#**--]                 | 300:       182 ns/iter (+/- 79)
test writer::bench_vec_writer_1_inline_never                ... bench: 600 |   [---****#**--]                       | 2000:       952 ns/iter (+/- 399)
```

Furthermore, commenting out the `self.dst.reserve(src_len)` made it just as fast as `BufWriter` and directly using the unsafe `ptr::copy_nonoverlapping_memory`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic slowdown in rust performance from the serialization benchmarks #19281

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dramatic slowdown in rust performance from the serialization benchmarks #19281

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions