Skip to content

[BPF] Fix BitCast Assertion with NonZero AddrSpace #130722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

yonghong-song
Copy link
Contributor

Alexei reported a bpf selftest failure with recent llvm for bpf prog file progs/arena_spin_lock.c. The failure only happens when clang is built with cmake option LLVM_ENABLE_ASSERTIONS=ON.

The error message looks like:

 clang: /home/yhs/work/yhs/llvm-project/llvm/lib/IR/Instructions.cpp:3460:
   llvm::BitCastInst::BitCastInst(Value *, Type *, const Twine &, InsertPosition):
   Assertion `castIsValid(getOpcode(), S, Ty) && "Illegal BitCast"' failed.

Further investigation shows that the problem is triggered in
BPF/BPFAbstractMemberAccess.cpp
for code

  auto *BCInst =
      new BitCastInst(Base, PointerType::getUnqual(BB->getContext()));

For the above BitCastInst, Since 'Base' has non-zero AddrSapce, the compiler expects the type also has the same AddrSpace. But the above PointerType::getUnqual(...) does not have AddrSpace and hence causes the assertion failure.

Providing the proper AddrSpace for the BitCast type fixed the issue.

Alexei reported a bpf selftest failure with recent llvm for bpf prog
file progs/arena_spin_lock.c. The failure only happens when clang
is built with cmake option LLVM_ENABLE_ASSERTIONS=ON.

The error message looks like:
 clang: /home/yhs/work/yhs/llvm-project/llvm/lib/IR/Instructions.cpp:3460:
   llvm::BitCastInst::BitCastInst(Value *, Type *, const Twine &, InsertPosition):
   Assertion `castIsValid(getOpcode(), S, Ty) && "Illegal BitCast"' failed.

Further investigation shows that the problem is triggered in
  BPF/BPFAbstractMemberAccess.cpp
for code
  auto *BCInst =
      new BitCastInst(Base, PointerType::getUnqual(BB->getContext()));
For the above BitCastInst, Since 'Base' has non-zero AddrSapce,
the compiler expects the type also has the same AddrSpace. But the
above PointerType::getUnqual(...) does not have AddrSpace and hence
causes the assertion failure.

Providing the proper AddrSpace for the BitCast type fixed the issue.
Copy link
Member

@4ast 4ast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the quick fix.

@4ast
Copy link
Member

4ast commented Mar 11, 2025

tested the fix. Works.

Copy link
Contributor

@eddyz87 eddyz87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me. At cursory glance I don't see other points in this pass that might require similar change.

I have a nit about the test: for IR to IR passes it is a general practice to run opt -passes=<the-pass> and verify the output. But all our tests for CORE seem to follow a different pattern, there is even no way to request BPFAbstractMemberAccess from opt, a change in the BPFPassRegistry.def is needed for that. So this is a moot point.

@yonghong-song
Copy link
Contributor Author

I have a nit about the test: for IR to IR passes it is a general practice to run opt -passes=<the-pass> and verify the output. But all our tests for CORE seem to follow a different pattern, there is even no way to request BPFAbstractMemberAccess from opt, a change in the BPFPassRegistry.def is needed for that. So this is a moot point.

Yes, I was actually aware of this when I tried to write the test. As you mentioned, we do not add a function pass for that. The main reason is that this test is to test compilation failure, so I didn't add func test.

Yes, previously we mostly care the final result, at least for people who understands final asm code but they may not be really familiar with IR.

@yonghong-song yonghong-song merged commit 5686786 into llvm:main Mar 11, 2025
13 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 11, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/1090

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/task-taskwait-nested.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c:52:11: error: CHECK: expected string not found in input
# | // CHECK: WARNING: ThreadSanitizer: data race
# |           ^
# | <stdin>:1:1: note: scanning from here
# | DONE
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: DONE 
# | check:52     X~~~~ error: no match found
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************


new BitCastInst(Base, PointerType::getUnqual(BB->getContext()));
auto *BCInst = new BitCastInst(
Base, PointerType::get(BB->getContext(),
Base->getType()->getPointerAddressSpace()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you drop this bitcast and the one below entirely? They are not needed anymore with opaque pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let me give a try.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this change make it into the backport: #130995

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since backport is only for bug fix, I think the suggestion to drop the above bitcast does not need to be backported since it does not fix any bug. I hope my understanding is correct. But for llvm21, I will make proper changes to remove these bitcasts.

yonghong-song pushed a commit to yonghong-song/llvm-project that referenced this pull request Mar 14, 2025
In [1], Nikita Popov spotted that two BitCast operations are not needed
with opaque pointers. So remove these two BitCast operations and adjust
corresponding comments as well.

  [1] llvm#130722
yonghong-song added a commit that referenced this pull request Mar 14, 2025
In [1], Nikita Popov spotted that two BitCast operations are not needed
with opaque pointers. So remove these two BitCast operations and adjust
corresponding comments as well.

  [1] #130722

Co-authored-by: Yonghong Song <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 14, 2025
In [1], Nikita Popov spotted that two BitCast operations are not needed
with opaque pointers. So remove these two BitCast operations and adjust
corresponding comments as well.

  [1] llvm/llvm-project#130722

Co-authored-by: Yonghong Song <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

6 participants