Skip to content

Missing CSE caused by for loop on slice #119573

Open
llvm/llvm-project
#100102
@Kobzol

Description

@Kobzol

While examining a weird performance regression on Zulip, we noticed a peculiar thing caused by a simple for loop.

Consider this code:

#[inline(never)]
fn add_index(bools: &[bool]) -> usize {
    let mut count: usize = 0;
    for i in 0..bools.len() {
        if bools[i] {
            count += 1;
        }
    }
    count
}

#[inline(never)]
fn add_iter(bools: &[bool]) -> usize {
    let mut count: usize = 0;
    for b in bools {
        if *b {
            count += 1;
        }
    }
    count
}

These two functions do the same thing, but the first one uses slice indexing, while the other one uses a more natural for loop. One would expect that the second version will optimize better (or same).

However, when these functions are called multiple times, only the first function is eligible for CSE (common subexpression elimination):

pub fn foo1(bools: &[bool]) {
    // CSE applied
    println!("a: {}", add_index(bools));
    println!("b: {}", add_index(bools));
}

pub fn foo2(bools: &[bool]) {
    // CSE not applied
    println!("a: {}", add_iter(bools));
    println!("b: {}", add_iter(bools));
}

Link to Godbolt.

When examining these two functions, I noticed that the indexed version uses nocapture for its argument, while the for loop version doesn't:

define noundef i64 @add_index(ptr noalias nocapture noundef nonnull readonly align 1 %bools.0, i64 noundef %bools.1) unnamed_addr #0 !dbg !7 {

define noundef i64 @add_iter(ptr noalias noundef nonnull readonly align 1 %bools.0, i64 noundef %bools.1) unnamed_addr #1 !dbg !57 {

which might be the reason of why CSE isn't applied.

It seems weird that the canonical way of iterating over a slice produces worse code (or at least function attributes) than manual indexing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions