Closed
Description
I noticed a significant regression since nightly 2018-01-27 when enabling LTO. Here is a simplified example that has 10x difference.
fn fill<T: Clone + 'static>(init: &'static T) -> Box<Fn(&usize) -> Vec<T>> {
Box::new(move |&i| {
let mut vec = Vec::<T>::new();
vec.resize(i, init.clone());
vec
})
}
fn main() {
let zeroes = fill(&0);
for _ in 0..1_000_000 {
zeroes(&1000);
}
}
Without LTO enabled:
$ perf stat -B -e cycles,instructions ./target/release/columnar
Performance counter stats for './target/release/regr':
335,160,070 cycles
539,078,439 instructions # 1.61 insn per cycle
0.112408607 seconds time elapsed
with LTO enabled:
Performance counter stats for './target/release/regr':
1,121,648,955 cycles
5,233,309,105 instructions # 4.67 insn per cycle
0.361163946 seconds time elapsed
Tested variants
Bad: -C lto=fat
Good: -C lto=thin
Good: -C lto=fat -C codegen-units=1