Skip to content

[MISched] Fix off-by-one error in debug output with -misched-cutoff=<n> flag #137988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 6, 2025

Conversation

c-rhodes
Copy link
Collaborator

@c-rhodes c-rhodes commented Apr 30, 2025

This flag instructs the scheduler to stop scheduling after N instructions, but
in the debug output it appears as if it's scheduling N+1 instructions, e.g.

$ llc -misched-cutoff=10 -debug-only=machine-scheduler
example.ll 2>&1 | grep "^Scheduling SU" | wc -l
11

as it calls pickNode before calling checkSchedLimit.

@c-rhodes c-rhodes requested a review from david-arm April 30, 2025 16:30
@mshockwave mshockwave self-requested a review April 30, 2025 18:27
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Scheduling SU" message you grep-ed was from pickNode. The reason you saw 11 messages when the cut-off was 10 is because checkSchedLimit is placed after pickNode:

SUnit *SU = SchedImpl->pickNode(IsTopNode);
if (!SU) break;
assert(!SU->isScheduled && "Node already scheduled");
if (!checkSchedLimit())
break;

So basically what happened was that although it showed 11 "Scheduling SU", the scheduler indeed only scheduled 10 instructions as it stopped right before it made any actual changes.

Therefore, following the same logic, I think your current patch will actually scheduled one less instruction then the cut-off value provided by the user, which is probably undesired.

I think a better solution might be putting checkSchedLimit before pickNode.

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented May 1, 2025

The "Scheduling SU" message you grep-ed was from pickNode. The reason you saw 11 messages when the cut-off was 10 is because checkSchedLimit is placed after pickNode:

SUnit *SU = SchedImpl->pickNode(IsTopNode);
if (!SU) break;
assert(!SU->isScheduled && "Node already scheduled");
if (!checkSchedLimit())
break;

So basically what happened was that although it showed 11 "Scheduling SU", the scheduler indeed only scheduled 10 instructions as it stopped right before it made any actual changes.

Therefore, following the same logic, I think your current patch will actually scheduled one less instruction then the cut-off value provided by the user, which is probably undesired.

I think a better solution might be putting checkSchedLimit before pickNode.

ah, well spotted! Thanks that makes more sense, updated 👍

@c-rhodes c-rhodes changed the title [MISched] Fix off-by-one error with -misched-cutoff=<n> flag [MISched] Fix off-by-one error in debug output with -misched-cutoff=<n> flag May 1, 2025
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, but could you add a simple test?

c-rhodes added a commit to c-rhodes/llvm-project that referenced this pull request May 2, 2025
@c-rhodes c-rhodes force-pushed the misched-cutoff-fix-off-by-one-error branch from 0b1319c to dbfbfeb Compare May 2, 2025 09:18
@llvmbot
Copy link
Member

llvmbot commented May 2, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Cullen Rhodes (c-rhodes)

Changes

This flag instructs the scheduler to stop scheduling after N instructions, but
in the debug output it appears as if it's scheduling N+1 instructions, e.g.

$ llc -misched-cutoff=10 -debug-only=machine-scheduler
example.ll 2>&1 | grep "^Scheduling SU" | wc -l
11

as it calls pickNode before calling checkSchedLimit.


Full diff: https://github.com/llvm/llvm-project/pull/137988.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+6-4)
  • (added) llvm/test/CodeGen/AArch64/misched-cutoff.mir (+55)
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 97f27277aface..77cf46178aabd 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -1003,13 +1003,14 @@ void ScheduleDAGMI::schedule() {
 
   bool IsTopNode = false;
   while (true) {
+    if (!checkSchedLimit())
+      break;
+
     LLVM_DEBUG(dbgs() << "** ScheduleDAGMI::schedule picking next node\n");
     SUnit *SU = SchedImpl->pickNode(IsTopNode);
     if (!SU) break;
 
     assert(!SU->isScheduled && "Node already scheduled");
-    if (!checkSchedLimit())
-      break;
 
     MachineInstr *MI = SU->getInstr();
     if (IsTopNode) {
@@ -1637,13 +1638,14 @@ void ScheduleDAGMILive::schedule() {
 
   bool IsTopNode = false;
   while (true) {
+    if (!checkSchedLimit())
+      break;
+
     LLVM_DEBUG(dbgs() << "** ScheduleDAGMILive::schedule picking next node\n");
     SUnit *SU = SchedImpl->pickNode(IsTopNode);
     if (!SU) break;
 
     assert(!SU->isScheduled && "Node already scheduled");
-    if (!checkSchedLimit())
-      break;
 
     scheduleMI(SU, IsTopNode);
 
diff --git a/llvm/test/CodeGen/AArch64/misched-cutoff.mir b/llvm/test/CodeGen/AArch64/misched-cutoff.mir
new file mode 100644
index 0000000000000..a61eb64318a39
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/misched-cutoff.mir
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=aarch64 -passes=machine-scheduler -o - %s | FileCheck %s
+# RUN: llc -mtriple=aarch64 -passes=machine-scheduler -misched-cutoff=1 -debug-only=machine-scheduler -o - %s 2>&1 | FileCheck %s --check-prefix=CHECK-CUTOFF
+
+# REQUIRES: asserts
+
+# CHECK-CUTOFF-COUNT-1: Scheduling SU
+
+# NOTE: copied from machine-scheduler.mir
+
+--- |
+  define i64 @load_imp-def(ptr nocapture %P, i32 %v) {
+  entry:
+    %0 = bitcast ptr %P to ptr
+    %1 = load i32, ptr %0
+    %conv = zext i32 %1 to i64
+    %arrayidx19 = getelementptr inbounds i64, ptr %P, i64 1
+    %arrayidx1 = bitcast ptr %arrayidx19 to ptr
+    store i32 %v, ptr %arrayidx1
+    %2 = load i64, ptr %arrayidx19
+    %and = and i64 %2, 4294967295
+    %add = add nuw nsw i64 %and, %conv
+    ret i64 %add
+  }
+...
+---
+name: load_imp-def
+tracksRegLiveness: true
+body: |
+  bb.0.entry:
+    liveins: $w1, $x0
+    ; CHECK-LABEL: name: load_imp-def
+    ; CHECK: liveins: $w1, $x0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: $w8 = LDRWui $x0, 1, implicit-def $x8 :: (load (s32) from %ir.0)
+    ; CHECK-NEXT: $w9 = LDRWui $x0, 0, implicit-def $x9 :: (load (s32) from %ir.arrayidx19, align 8)
+    ; CHECK-NEXT: STRWui $w1, $x0, 2 :: (store (s32) into %ir.arrayidx1)
+    ; CHECK-NEXT: $x0 = ADDXrr killed $x9, killed $x8
+    ; CHECK-NEXT: RET_ReallyLR implicit $x0
+    ;
+    ; CHECK-CUTOFF-LABEL: name: load_imp-def
+    ; CHECK-CUTOFF: liveins: $w1, $x0
+    ; CHECK-CUTOFF-NEXT: {{  $}}
+    ; CHECK-CUTOFF-NEXT: $w8 = LDRWui $x0, 1, implicit-def $x8 :: (load (s32) from %ir.0)
+    ; CHECK-CUTOFF-NEXT: STRWui $w1, $x0, 2 :: (store (s32) into %ir.arrayidx1)
+    ; CHECK-CUTOFF-NEXT: $w9 = LDRWui $x0, 0, implicit-def $x9 :: (load (s32) from %ir.arrayidx19, align 8)
+    ; CHECK-CUTOFF-NEXT: $x0 = ADDXrr killed $x9, killed $x8
+    ; CHECK-CUTOFF-NEXT: RET_ReallyLR implicit $x0
+    $w8 = LDRWui $x0, 1, implicit-def $x8  :: (load (s32) from %ir.0)
+    STRWui killed $w1, $x0, 2 :: (store (s32) into %ir.arrayidx1)
+    $w9 = LDRWui killed $x0, 0, implicit-def $x9  :: (load (s32) from %ir.arrayidx19, align 8)
+    $x0 = ADDXrr killed $x9, killed $x8
+    RET_ReallyLR implicit $x0
+...
+

@c-rhodes
Copy link
Collaborator Author

c-rhodes commented May 2, 2025

generally looks good, but could you add a simple test?

put a test up for precommit here #138243

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

c-rhodes added a commit that referenced this pull request May 6, 2025
c-rhodes added 3 commits May 6, 2025 07:51
This flag instructs the scheduler to stop scheduling after N
instructions, but it currently schedules N+1 instructions, e.g.

  $ llc -misched-cutoff=10 -debug-only=machine-scheduler \
      example.ll 2>&1 | grep  "^Scheduling SU" | wc -l
  11
@c-rhodes c-rhodes force-pushed the misched-cutoff-fix-off-by-one-error branch from dbfbfeb to fa90da2 Compare May 6, 2025 08:06
@c-rhodes c-rhodes merged commit 8ea5eac into llvm:main May 6, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 6, 2025

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building llvm at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/12455

Here is the relevant piece of the build log for the reference
Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x5b9abce964d0 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x5b9abce964d0 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
…n> flag (llvm#137988)

This flag instructs the scheduler to stop scheduling after N
instructions, but
in the debug output it appears as if it's scheduling N+1 instructions,
e.g.

$ llc -misched-cutoff=10 -debug-only=machine-scheduler
example.ll 2>&1 | grep "^Scheduling SU" | wc -l
11

as it calls pickNode before calling checkSchedLimit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants