Skip to content

[LV] Add test cases for reverse accesses involving irregular types. nfc #135139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 14, 2025

Conversation

Mel-Chen
Copy link
Contributor

@Mel-Chen Mel-Chen commented Apr 10, 2025

Add a test with irregular type to ensure the vector load/store instructions are not generated.

@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Mel Chen (Mel-Chen)

Changes

Add a test with irregular type to ensure vector types are not generated.


Patch is 44.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/135139.diff

1 Files Affected:

  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+625)
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
index f01aaa04606d9..31a774f8d8525 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll
@@ -429,6 +429,631 @@ exit:
   ret void
 }
 
+define void @vector_reverse_irregular_type(ptr noalias %A, ptr noalias %B) {
+; RV64-LABEL: define void @vector_reverse_irregular_type(
+; RV64-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]]) #[[ATTR0]] {
+; RV64-NEXT:  [[ENTRY:.*]]:
+; RV64-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; RV64:       [[VECTOR_PH]]:
+; RV64-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV64:       [[VECTOR_BODY]]:
+; RV64-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV64-NEXT:    [[OFFSET_IDX:%.*]] = sub i64 1023, [[INDEX]]
+; RV64-NEXT:    [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
+; RV64-NEXT:    [[TMP1:%.*]] = add i64 [[OFFSET_IDX]], -1
+; RV64-NEXT:    [[TMP2:%.*]] = add i64 [[OFFSET_IDX]], -2
+; RV64-NEXT:    [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -3
+; RV64-NEXT:    [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], -4
+; RV64-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], -5
+; RV64-NEXT:    [[TMP6:%.*]] = add i64 [[OFFSET_IDX]], -6
+; RV64-NEXT:    [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], -7
+; RV64-NEXT:    [[TMP8:%.*]] = add i64 [[OFFSET_IDX]], -8
+; RV64-NEXT:    [[TMP9:%.*]] = add i64 [[OFFSET_IDX]], -9
+; RV64-NEXT:    [[TMP10:%.*]] = add i64 [[OFFSET_IDX]], -10
+; RV64-NEXT:    [[TMP11:%.*]] = add i64 [[OFFSET_IDX]], -11
+; RV64-NEXT:    [[TMP12:%.*]] = add i64 [[OFFSET_IDX]], -12
+; RV64-NEXT:    [[TMP13:%.*]] = add i64 [[OFFSET_IDX]], -13
+; RV64-NEXT:    [[TMP14:%.*]] = add i64 [[OFFSET_IDX]], -14
+; RV64-NEXT:    [[TMP15:%.*]] = add i64 [[OFFSET_IDX]], -15
+; RV64-NEXT:    [[TMP16:%.*]] = add nsw i64 [[TMP0]], -1
+; RV64-NEXT:    [[TMP17:%.*]] = add nsw i64 [[TMP1]], -1
+; RV64-NEXT:    [[TMP18:%.*]] = add nsw i64 [[TMP2]], -1
+; RV64-NEXT:    [[TMP19:%.*]] = add nsw i64 [[TMP3]], -1
+; RV64-NEXT:    [[TMP20:%.*]] = add nsw i64 [[TMP4]], -1
+; RV64-NEXT:    [[TMP21:%.*]] = add nsw i64 [[TMP5]], -1
+; RV64-NEXT:    [[TMP22:%.*]] = add nsw i64 [[TMP6]], -1
+; RV64-NEXT:    [[TMP23:%.*]] = add nsw i64 [[TMP7]], -1
+; RV64-NEXT:    [[TMP24:%.*]] = add nsw i64 [[TMP8]], -1
+; RV64-NEXT:    [[TMP25:%.*]] = add nsw i64 [[TMP9]], -1
+; RV64-NEXT:    [[TMP26:%.*]] = add nsw i64 [[TMP10]], -1
+; RV64-NEXT:    [[TMP27:%.*]] = add nsw i64 [[TMP11]], -1
+; RV64-NEXT:    [[TMP28:%.*]] = add nsw i64 [[TMP12]], -1
+; RV64-NEXT:    [[TMP29:%.*]] = add nsw i64 [[TMP13]], -1
+; RV64-NEXT:    [[TMP30:%.*]] = add nsw i64 [[TMP14]], -1
+; RV64-NEXT:    [[TMP31:%.*]] = add nsw i64 [[TMP15]], -1
+; RV64-NEXT:    [[TMP32:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP16]]
+; RV64-NEXT:    [[TMP33:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP17]]
+; RV64-NEXT:    [[TMP34:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP18]]
+; RV64-NEXT:    [[TMP35:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP36:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP20]]
+; RV64-NEXT:    [[TMP37:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP21]]
+; RV64-NEXT:    [[TMP38:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP22]]
+; RV64-NEXT:    [[TMP39:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP23]]
+; RV64-NEXT:    [[TMP40:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP24]]
+; RV64-NEXT:    [[TMP41:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP25]]
+; RV64-NEXT:    [[TMP42:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP26]]
+; RV64-NEXT:    [[TMP43:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP27]]
+; RV64-NEXT:    [[TMP44:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP28]]
+; RV64-NEXT:    [[TMP45:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP29]]
+; RV64-NEXT:    [[TMP46:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP30]]
+; RV64-NEXT:    [[TMP47:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP31]]
+; RV64-NEXT:    [[TMP48:%.*]] = load i7, ptr [[TMP32]], align 1
+; RV64-NEXT:    [[TMP49:%.*]] = load i7, ptr [[TMP33]], align 1
+; RV64-NEXT:    [[TMP50:%.*]] = load i7, ptr [[TMP34]], align 1
+; RV64-NEXT:    [[TMP51:%.*]] = load i7, ptr [[TMP35]], align 1
+; RV64-NEXT:    [[TMP52:%.*]] = load i7, ptr [[TMP36]], align 1
+; RV64-NEXT:    [[TMP53:%.*]] = load i7, ptr [[TMP37]], align 1
+; RV64-NEXT:    [[TMP54:%.*]] = load i7, ptr [[TMP38]], align 1
+; RV64-NEXT:    [[TMP55:%.*]] = load i7, ptr [[TMP39]], align 1
+; RV64-NEXT:    [[TMP56:%.*]] = load i7, ptr [[TMP40]], align 1
+; RV64-NEXT:    [[TMP57:%.*]] = load i7, ptr [[TMP41]], align 1
+; RV64-NEXT:    [[TMP58:%.*]] = load i7, ptr [[TMP42]], align 1
+; RV64-NEXT:    [[TMP59:%.*]] = load i7, ptr [[TMP43]], align 1
+; RV64-NEXT:    [[TMP60:%.*]] = load i7, ptr [[TMP44]], align 1
+; RV64-NEXT:    [[TMP61:%.*]] = load i7, ptr [[TMP45]], align 1
+; RV64-NEXT:    [[TMP62:%.*]] = load i7, ptr [[TMP46]], align 1
+; RV64-NEXT:    [[TMP63:%.*]] = load i7, ptr [[TMP47]], align 1
+; RV64-NEXT:    [[TMP64:%.*]] = insertelement <16 x i7> poison, i7 [[TMP48]], i32 0
+; RV64-NEXT:    [[TMP65:%.*]] = insertelement <16 x i7> [[TMP64]], i7 [[TMP49]], i32 1
+; RV64-NEXT:    [[TMP66:%.*]] = insertelement <16 x i7> [[TMP65]], i7 [[TMP50]], i32 2
+; RV64-NEXT:    [[TMP67:%.*]] = insertelement <16 x i7> [[TMP66]], i7 [[TMP51]], i32 3
+; RV64-NEXT:    [[TMP68:%.*]] = insertelement <16 x i7> [[TMP67]], i7 [[TMP52]], i32 4
+; RV64-NEXT:    [[TMP69:%.*]] = insertelement <16 x i7> [[TMP68]], i7 [[TMP53]], i32 5
+; RV64-NEXT:    [[TMP70:%.*]] = insertelement <16 x i7> [[TMP69]], i7 [[TMP54]], i32 6
+; RV64-NEXT:    [[TMP71:%.*]] = insertelement <16 x i7> [[TMP70]], i7 [[TMP55]], i32 7
+; RV64-NEXT:    [[TMP72:%.*]] = insertelement <16 x i7> [[TMP71]], i7 [[TMP56]], i32 8
+; RV64-NEXT:    [[TMP73:%.*]] = insertelement <16 x i7> [[TMP72]], i7 [[TMP57]], i32 9
+; RV64-NEXT:    [[TMP74:%.*]] = insertelement <16 x i7> [[TMP73]], i7 [[TMP58]], i32 10
+; RV64-NEXT:    [[TMP75:%.*]] = insertelement <16 x i7> [[TMP74]], i7 [[TMP59]], i32 11
+; RV64-NEXT:    [[TMP76:%.*]] = insertelement <16 x i7> [[TMP75]], i7 [[TMP60]], i32 12
+; RV64-NEXT:    [[TMP77:%.*]] = insertelement <16 x i7> [[TMP76]], i7 [[TMP61]], i32 13
+; RV64-NEXT:    [[TMP78:%.*]] = insertelement <16 x i7> [[TMP77]], i7 [[TMP62]], i32 14
+; RV64-NEXT:    [[TMP79:%.*]] = insertelement <16 x i7> [[TMP78]], i7 [[TMP63]], i32 15
+; RV64-NEXT:    [[TMP80:%.*]] = add <16 x i7> [[TMP79]], splat (i7 1)
+; RV64-NEXT:    [[TMP81:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP16]]
+; RV64-NEXT:    [[TMP82:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP17]]
+; RV64-NEXT:    [[TMP83:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP18]]
+; RV64-NEXT:    [[TMP84:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP19]]
+; RV64-NEXT:    [[TMP85:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP20]]
+; RV64-NEXT:    [[TMP86:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP21]]
+; RV64-NEXT:    [[TMP87:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP22]]
+; RV64-NEXT:    [[TMP88:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP23]]
+; RV64-NEXT:    [[TMP89:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP24]]
+; RV64-NEXT:    [[TMP90:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP25]]
+; RV64-NEXT:    [[TMP91:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP26]]
+; RV64-NEXT:    [[TMP92:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP27]]
+; RV64-NEXT:    [[TMP93:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP28]]
+; RV64-NEXT:    [[TMP94:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP29]]
+; RV64-NEXT:    [[TMP95:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP30]]
+; RV64-NEXT:    [[TMP96:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP31]]
+; RV64-NEXT:    [[TMP97:%.*]] = extractelement <16 x i7> [[TMP80]], i32 0
+; RV64-NEXT:    store i7 [[TMP97]], ptr [[TMP81]], align 1
+; RV64-NEXT:    [[TMP98:%.*]] = extractelement <16 x i7> [[TMP80]], i32 1
+; RV64-NEXT:    store i7 [[TMP98]], ptr [[TMP82]], align 1
+; RV64-NEXT:    [[TMP99:%.*]] = extractelement <16 x i7> [[TMP80]], i32 2
+; RV64-NEXT:    store i7 [[TMP99]], ptr [[TMP83]], align 1
+; RV64-NEXT:    [[TMP100:%.*]] = extractelement <16 x i7> [[TMP80]], i32 3
+; RV64-NEXT:    store i7 [[TMP100]], ptr [[TMP84]], align 1
+; RV64-NEXT:    [[TMP101:%.*]] = extractelement <16 x i7> [[TMP80]], i32 4
+; RV64-NEXT:    store i7 [[TMP101]], ptr [[TMP85]], align 1
+; RV64-NEXT:    [[TMP102:%.*]] = extractelement <16 x i7> [[TMP80]], i32 5
+; RV64-NEXT:    store i7 [[TMP102]], ptr [[TMP86]], align 1
+; RV64-NEXT:    [[TMP103:%.*]] = extractelement <16 x i7> [[TMP80]], i32 6
+; RV64-NEXT:    store i7 [[TMP103]], ptr [[TMP87]], align 1
+; RV64-NEXT:    [[TMP104:%.*]] = extractelement <16 x i7> [[TMP80]], i32 7
+; RV64-NEXT:    store i7 [[TMP104]], ptr [[TMP88]], align 1
+; RV64-NEXT:    [[TMP105:%.*]] = extractelement <16 x i7> [[TMP80]], i32 8
+; RV64-NEXT:    store i7 [[TMP105]], ptr [[TMP89]], align 1
+; RV64-NEXT:    [[TMP106:%.*]] = extractelement <16 x i7> [[TMP80]], i32 9
+; RV64-NEXT:    store i7 [[TMP106]], ptr [[TMP90]], align 1
+; RV64-NEXT:    [[TMP107:%.*]] = extractelement <16 x i7> [[TMP80]], i32 10
+; RV64-NEXT:    store i7 [[TMP107]], ptr [[TMP91]], align 1
+; RV64-NEXT:    [[TMP108:%.*]] = extractelement <16 x i7> [[TMP80]], i32 11
+; RV64-NEXT:    store i7 [[TMP108]], ptr [[TMP92]], align 1
+; RV64-NEXT:    [[TMP109:%.*]] = extractelement <16 x i7> [[TMP80]], i32 12
+; RV64-NEXT:    store i7 [[TMP109]], ptr [[TMP93]], align 1
+; RV64-NEXT:    [[TMP110:%.*]] = extractelement <16 x i7> [[TMP80]], i32 13
+; RV64-NEXT:    store i7 [[TMP110]], ptr [[TMP94]], align 1
+; RV64-NEXT:    [[TMP111:%.*]] = extractelement <16 x i7> [[TMP80]], i32 14
+; RV64-NEXT:    store i7 [[TMP111]], ptr [[TMP95]], align 1
+; RV64-NEXT:    [[TMP112:%.*]] = extractelement <16 x i7> [[TMP80]], i32 15
+; RV64-NEXT:    store i7 [[TMP112]], ptr [[TMP96]], align 1
+; RV64-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; RV64-NEXT:    [[TMP113:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1008
+; RV64-NEXT:    br i1 [[TMP113]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; RV64:       [[MIDDLE_BLOCK]]:
+; RV64-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; RV64:       [[SCALAR_PH]]:
+; RV64-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 15, %[[MIDDLE_BLOCK]] ], [ 1023, %[[ENTRY]] ]
+; RV64-NEXT:    br label %[[FOR_BODY:.*]]
+; RV64:       [[FOR_BODY]]:
+; RV64-NEXT:    [[DEC_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; RV64-NEXT:    [[IV_NEXT]] = add nsw i64 [[DEC_IV]], -1
+; RV64-NEXT:    [[ARRAYIDX_B:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[IV_NEXT]]
+; RV64-NEXT:    [[TMP114:%.*]] = load i7, ptr [[ARRAYIDX_B]], align 1
+; RV64-NEXT:    [[ADD:%.*]] = add i7 [[TMP114]], 1
+; RV64-NEXT:    [[ARRAYIDX_A:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[IV_NEXT]]
+; RV64-NEXT:    store i7 [[ADD]], ptr [[ARRAYIDX_A]], align 1
+; RV64-NEXT:    [[CMP:%.*]] = icmp ugt i64 [[DEC_IV]], 1
+; RV64-NEXT:    br i1 [[CMP]], label %[[FOR_BODY]], label %[[EXIT]], !llvm.loop [[LOOP7:![0-9]+]]
+; RV64:       [[EXIT]]:
+; RV64-NEXT:    ret void
+;
+; RV32-LABEL: define void @vector_reverse_irregular_type(
+; RV32-SAME: ptr noalias [[A:%.*]], ptr noalias [[B:%.*]]) #[[ATTR0]] {
+; RV32-NEXT:  [[ENTRY:.*]]:
+; RV32-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; RV32:       [[VECTOR_PH]]:
+; RV32-NEXT:    br label %[[VECTOR_BODY:.*]]
+; RV32:       [[VECTOR_BODY]]:
+; RV32-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; RV32-NEXT:    [[OFFSET_IDX:%.*]] = sub i64 1023, [[INDEX]]
+; RV32-NEXT:    [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
+; RV32-NEXT:    [[TMP1:%.*]] = add i64 [[OFFSET_IDX]], -1
+; RV32-NEXT:    [[TMP2:%.*]] = add i64 [[OFFSET_IDX]], -2
+; RV32-NEXT:    [[TMP3:%.*]] = add i64 [[OFFSET_IDX]], -3
+; RV32-NEXT:    [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], -4
+; RV32-NEXT:    [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], -5
+; RV32-NEXT:    [[TMP6:%.*]] = add i64 [[OFFSET_IDX]], -6
+; RV32-NEXT:    [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], -7
+; RV32-NEXT:    [[TMP8:%.*]] = add i64 [[OFFSET_IDX]], -8
+; RV32-NEXT:    [[TMP9:%.*]] = add i64 [[OFFSET_IDX]], -9
+; RV32-NEXT:    [[TMP10:%.*]] = add i64 [[OFFSET_IDX]], -10
+; RV32-NEXT:    [[TMP11:%.*]] = add i64 [[OFFSET_IDX]], -11
+; RV32-NEXT:    [[TMP12:%.*]] = add i64 [[OFFSET_IDX]], -12
+; RV32-NEXT:    [[TMP13:%.*]] = add i64 [[OFFSET_IDX]], -13
+; RV32-NEXT:    [[TMP14:%.*]] = add i64 [[OFFSET_IDX]], -14
+; RV32-NEXT:    [[TMP15:%.*]] = add i64 [[OFFSET_IDX]], -15
+; RV32-NEXT:    [[TMP16:%.*]] = add nsw i64 [[TMP0]], -1
+; RV32-NEXT:    [[TMP17:%.*]] = add nsw i64 [[TMP1]], -1
+; RV32-NEXT:    [[TMP18:%.*]] = add nsw i64 [[TMP2]], -1
+; RV32-NEXT:    [[TMP19:%.*]] = add nsw i64 [[TMP3]], -1
+; RV32-NEXT:    [[TMP20:%.*]] = add nsw i64 [[TMP4]], -1
+; RV32-NEXT:    [[TMP21:%.*]] = add nsw i64 [[TMP5]], -1
+; RV32-NEXT:    [[TMP22:%.*]] = add nsw i64 [[TMP6]], -1
+; RV32-NEXT:    [[TMP23:%.*]] = add nsw i64 [[TMP7]], -1
+; RV32-NEXT:    [[TMP24:%.*]] = add nsw i64 [[TMP8]], -1
+; RV32-NEXT:    [[TMP25:%.*]] = add nsw i64 [[TMP9]], -1
+; RV32-NEXT:    [[TMP26:%.*]] = add nsw i64 [[TMP10]], -1
+; RV32-NEXT:    [[TMP27:%.*]] = add nsw i64 [[TMP11]], -1
+; RV32-NEXT:    [[TMP28:%.*]] = add nsw i64 [[TMP12]], -1
+; RV32-NEXT:    [[TMP29:%.*]] = add nsw i64 [[TMP13]], -1
+; RV32-NEXT:    [[TMP30:%.*]] = add nsw i64 [[TMP14]], -1
+; RV32-NEXT:    [[TMP31:%.*]] = add nsw i64 [[TMP15]], -1
+; RV32-NEXT:    [[TMP32:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP16]]
+; RV32-NEXT:    [[TMP33:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP17]]
+; RV32-NEXT:    [[TMP34:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP18]]
+; RV32-NEXT:    [[TMP35:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP19]]
+; RV32-NEXT:    [[TMP36:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP20]]
+; RV32-NEXT:    [[TMP37:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP21]]
+; RV32-NEXT:    [[TMP38:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP22]]
+; RV32-NEXT:    [[TMP39:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP23]]
+; RV32-NEXT:    [[TMP40:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP24]]
+; RV32-NEXT:    [[TMP41:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP25]]
+; RV32-NEXT:    [[TMP42:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP26]]
+; RV32-NEXT:    [[TMP43:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP27]]
+; RV32-NEXT:    [[TMP44:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP28]]
+; RV32-NEXT:    [[TMP45:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP29]]
+; RV32-NEXT:    [[TMP46:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP30]]
+; RV32-NEXT:    [[TMP47:%.*]] = getelementptr inbounds i7, ptr [[B]], i64 [[TMP31]]
+; RV32-NEXT:    [[TMP48:%.*]] = load i7, ptr [[TMP32]], align 1
+; RV32-NEXT:    [[TMP49:%.*]] = load i7, ptr [[TMP33]], align 1
+; RV32-NEXT:    [[TMP50:%.*]] = load i7, ptr [[TMP34]], align 1
+; RV32-NEXT:    [[TMP51:%.*]] = load i7, ptr [[TMP35]], align 1
+; RV32-NEXT:    [[TMP52:%.*]] = load i7, ptr [[TMP36]], align 1
+; RV32-NEXT:    [[TMP53:%.*]] = load i7, ptr [[TMP37]], align 1
+; RV32-NEXT:    [[TMP54:%.*]] = load i7, ptr [[TMP38]], align 1
+; RV32-NEXT:    [[TMP55:%.*]] = load i7, ptr [[TMP39]], align 1
+; RV32-NEXT:    [[TMP56:%.*]] = load i7, ptr [[TMP40]], align 1
+; RV32-NEXT:    [[TMP57:%.*]] = load i7, ptr [[TMP41]], align 1
+; RV32-NEXT:    [[TMP58:%.*]] = load i7, ptr [[TMP42]], align 1
+; RV32-NEXT:    [[TMP59:%.*]] = load i7, ptr [[TMP43]], align 1
+; RV32-NEXT:    [[TMP60:%.*]] = load i7, ptr [[TMP44]], align 1
+; RV32-NEXT:    [[TMP61:%.*]] = load i7, ptr [[TMP45]], align 1
+; RV32-NEXT:    [[TMP62:%.*]] = load i7, ptr [[TMP46]], align 1
+; RV32-NEXT:    [[TMP63:%.*]] = load i7, ptr [[TMP47]], align 1
+; RV32-NEXT:    [[TMP64:%.*]] = insertelement <16 x i7> poison, i7 [[TMP48]], i32 0
+; RV32-NEXT:    [[TMP65:%.*]] = insertelement <16 x i7> [[TMP64]], i7 [[TMP49]], i32 1
+; RV32-NEXT:    [[TMP66:%.*]] = insertelement <16 x i7> [[TMP65]], i7 [[TMP50]], i32 2
+; RV32-NEXT:    [[TMP67:%.*]] = insertelement <16 x i7> [[TMP66]], i7 [[TMP51]], i32 3
+; RV32-NEXT:    [[TMP68:%.*]] = insertelement <16 x i7> [[TMP67]], i7 [[TMP52]], i32 4
+; RV32-NEXT:    [[TMP69:%.*]] = insertelement <16 x i7> [[TMP68]], i7 [[TMP53]], i32 5
+; RV32-NEXT:    [[TMP70:%.*]] = insertelement <16 x i7> [[TMP69]], i7 [[TMP54]], i32 6
+; RV32-NEXT:    [[TMP71:%.*]] = insertelement <16 x i7> [[TMP70]], i7 [[TMP55]], i32 7
+; RV32-NEXT:    [[TMP72:%.*]] = insertelement <16 x i7> [[TMP71]], i7 [[TMP56]], i32 8
+; RV32-NEXT:    [[TMP73:%.*]] = insertelement <16 x i7> [[TMP72]], i7 [[TMP57]], i32 9
+; RV32-NEXT:    [[TMP74:%.*]] = insertelement <16 x i7> [[TMP73]], i7 [[TMP58]], i32 10
+; RV32-NEXT:    [[TMP75:%.*]] = insertelement <16 x i7> [[TMP74]], i7 [[TMP59]], i32 11
+; RV32-NEXT:    [[TMP76:%.*]] = insertelement <16 x i7> [[TMP75]], i7 [[TMP60]], i32 12
+; RV32-NEXT:    [[TMP77:%.*]] = insertelement <16 x i7> [[TMP76]], i7 [[TMP61]], i32 13
+; RV32-NEXT:    [[TMP78:%.*]] = insertelement <16 x i7> [[TMP77]], i7 [[TMP62]], i32 14
+; RV32-NEXT:    [[TMP79:%.*]] = insertelement <16 x i7> [[TMP78]], i7 [[TMP63]], i32 15
+; RV32-NEXT:    [[TMP80:%.*]] = add <16 x i7> [[TMP79]], splat (i7 1)
+; RV32-NEXT:    [[TMP81:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP16]]
+; RV32-NEXT:    [[TMP82:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP17]]
+; RV32-NEXT:    [[TMP83:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP18]]
+; RV32-NEXT:    [[TMP84:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP19]]
+; RV32-NEXT:    [[TMP85:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP20]]
+; RV32-NEXT:    [[TMP86:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP21]]
+; RV32-NEXT:    [[TMP87:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP22]]
+; RV32-NEXT:    [[TMP88:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP23]]
+; RV32-NEXT:    [[TMP89:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP24]]
+; RV32-NEXT:    [[TMP90:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP25]]
+; RV32-NEXT:    [[TMP91:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP26]]
+; RV32-NEXT:    [[TMP92:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP27]]
+; RV32-NEXT:    [[TMP93:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP28]]
+; RV32-NEXT:    [[TMP94:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP29]]
+; RV32-NEXT:    [[TMP95:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP30]]
+; RV32-NEXT:    [[TMP96:%.*]] = getelementptr inbounds i7, ptr [[A]], i64 [[TMP31]]
+; RV32-NEXT:    [[TMP97:%.*]] = extractelement <16 x i7> [[TMP80]], i32 0
+; RV32-NEXT:    store i7 [[TMP97]], ptr [[TMP81]], align 1
+; RV32-NEXT:    [[TMP98:%.*]] = extractelement <16 x i7> [[TMP80]], i32 1
+; RV32-NEXT:    store i7 [[TMP98]], ptr [[TMP82]], align 1
+; RV32-NEXT:    [[TMP99:%.*]] = extractelement <16 x i7> [[TMP80]], i32 2
+; RV32-NEXT:    store i7 [[TMP99]], ptr [[TMP83]], align 1
+; RV32-NEXT:    [[TMP100:%.*]] = extractelement <16 x i7> [[TMP80]], i32 3
+; RV32-NEXT:    store i7 [[TMP100]], ptr [[TMP84]], align 1
+; RV32-NEXT:    [[TMP101:%.*]] = extractelement <16 x i7> [[TMP80]], i32 4
+; RV32-NEXT:    store i7 [[TMP101]], ptr [[TMP85]], align 1
+; RV32-NEXT:    [[TMP102:%.*]] = extractelement <16 x i7> [[TMP80]], i32 5
+; RV32-NEXT:    store i7 [[TMP102]], ptr [[TMP86]], align 1
+; RV32-NEXT:    [[TMP103:%.*]] = extractelement <16 x i7> [[TMP80]], i32 6
+; RV32-NEXT:    store i7 [[TMP103]], ptr [[TMP87]], align 1
+; RV32-NEXT:    [[TMP104:%.*]] = extractelement <16 x i7> [[TMP80]], i32 7
+; RV32-NEXT:    store i7 [[TMP104]], ptr [[TMP88]], align 1
+; RV32-NEXT:    [[TMP105:%.*]] = extractelement <16 x i7> [[TMP80]], i32 8
+; RV32-NEXT:    store i7 [[TMP105]], ptr [[TMP89...
[truncated]

; RV64-NEXT: [[TMP77:%.*]] = insertelement <16 x i7> [[TMP76]], i7 [[TMP61]], i32 13
; RV64-NEXT: [[TMP78:%.*]] = insertelement <16 x i7> [[TMP77]], i7 [[TMP62]], i32 14
; RV64-NEXT: [[TMP79:%.*]] = insertelement <16 x i7> [[TMP78]], i7 [[TMP63]], i32 15
; RV64-NEXT: [[TMP80:%.*]] = add <16 x i7> [[TMP79]], splat (i7 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we have completely ignored the loop attribute that requests llvm.loop.vectorize.width=4? It's a shame the test is generating a VF of 16 because that leads to an excessive number of CHECK lines, whereas I think a lower VF would still defend the same code paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that scalarized load/store currently only supports fixed VFs, but !llvm.loop !0 requests a user VF of vscale x 4.
The current strategy for handling such cases is to allow the compiler to freely choose the VF.

     // Only clamp if the UserVF is not scalable. If the UserVF is scalable, it
     // is better to ignore the hint and let the compiler choose a suitable VF.

As a result, the actual VF may differ from the specified width. This behavior indeed doesn't quite match the intended semantics described in the comment for ScalableForceKind::SK_PreferScalable.

    /// Vectorize loops using scalable vectors or fixed-width vectors, but favor
    /// scalable vectors when the cost-model is inconclusive. This is the
    /// default when the scalable.enable hint is enabled through a pragma.
    SK_PreferScalable = 1

Do we consider updating the behavior so that when a user-specified scalable VF can't be used, we fall back to a fixed VF instead? That would better align with the definition of SK_PreferScalable. cc @fhahn

But for now, since this is just an NFC test patch, I added a new hint with fixed VF 4. 8f6bd75

@Mel-Chen Mel-Chen requested a review from david-arm April 11, 2025 07:27
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case LGTM, thanks

@Mel-Chen Mel-Chen merged commit ffd5b14 into llvm:main Apr 14, 2025
11 checks passed
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…fc (llvm#135139)

Add a test with irregular type to ensure the vector load/store
instructions are not generated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants