Open
Description
I've finally gotten around to doing some proper benchmarking of rust versions for my crate:
http://chimper.org/rawloader-rustc-benchmarks/
As can be seen in the graph on that page there's a general performance improvement over time but there are some very negative outliers. Most (maybe all) of them seem to be very simple loops that decode packed formats. Since rust 1.25 those are seeing 30-40% degradations in performance. I've extracted a minimal test case that shows the issue:
fn decode_12le(buf: &[u8], width: usize, height: usize) -> Vec<u16> {
let mut out: Vec<u16> = vec![0; width*height];
for (row, line) in out.chunks_mut(width).enumerate() {
let inb = &buf[(row*width*12/8)..];
for (o, i) in line.chunks_mut(2).zip(inb.chunks(3)) {
let g1: u16 = i[0] as u16;
let g2: u16 = i[1] as u16;
let g3: u16 = i[2] as u16;
o[0] = ((g2 & 0x0f) << 8) | g1;
o[1] = (g3 << 4) | (g2 >> 4);
}
}
out
}
fn main() {
let width = 5000;
let height = 4000;
let buffer: Vec<u8> = vec![0; width*height*12/8];
for _ in 0..100 {
decode_12le(&buffer, width, height);
}
}
Here's a test run on my machine:
$ rustc +1.24.0 -C opt-level=3 bench_decode.rs
$ time ./bench_decode
real 0m4.817s
user 0m3.581s
sys 0m1.236s
$ rustc +1.25.0 -C opt-level=3 bench_decode.rs
$ time ./bench_decode
real 0m6.263s
user 0m5.067s
sys 0m1.196s
Metadata
Metadata
Assignees
Labels
Category: An issue highlighting optimization opportunities or PRs implementing suchIssue: Problems and improvements with respect to performance of generated code.Relevant to the library team, which will review and decide on the PR/issue.Performance or correctness regression from one stable version to another.