Skip to content

Draft: [LV] Outer-loop vectorization in the default vectorizer codepath #128202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

iamlouk
Copy link
Contributor

@iamlouk iamlouk commented Feb 21, 2025

This is a draft MR to get feedback if something like this would be considered
a good-enough approach by current maintainers to merge into LLVM. I would split
it into smaller pieces if the general direction is not conflicting with current plans.
It implements outer-loop vectorization outside the VPlan-native path. Minimal
LoopAccessAnalysis support for non-innermost loops was added relying on the
!llvm.loop.parallel_accesses metadata.

Unlike for the VPlan-native path, inner loops with non-invariant trip-counts or
non-uniform inductions are supported, and the quality of the emitted code is better
than that of the current VPlan-native path.

A implementation very close to this one (#124432 required some changes but
also simplified this MR a lot) was successfully tested in combination with
basic LAA MemoryDepChecker support (not part of this MR) for outer loops on the
llvm-test-suite and SPEC (~3000 loops, outer-loop vect. was forced).

As a real-world motivational example, this loop
can be looked at. Performance is more than doubled when outer-loop vectorizing it.

Some code for the VPWidenPHIRecipes is duplicated from #128187.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant