Skip to content

[Clang] Always verify LLVM IR inputs #134396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 7, 2025
Merged

[Clang] Always verify LLVM IR inputs #134396

merged 2 commits into from
Apr 7, 2025

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Apr 4, 2025

We get a lot of issues that basically boil down to "I passed malformed LLVM IR to clang and it crashed". Clang does not perform IR verification by default in (non-assertion-enabled) release builds, and that's sensible for IR that Clang itself produces, which is expected to always be valid. However, if people pass in their own handwritten IR, we should report if it is malformed, instead of crashing. We should also report it in a way that does not produce a crash trace and ask for a bug report, as currently happens in assertions-enabled builds. This aligns the behavior with how opt/llc work.

We get a lot of issues that basically boil down to "I passed
malformed LLVM IR to clang and it crashed". Clang does not perform
IR verification by default in (non-assertion-enabled) release
builds, and that's sensible for IR that Clang itself produces,
which is expected to always be valid. However, if people pass in
their own handwritten IR, we should report if it is malformed,
instead of crashing. We should also report it in a way that does
not produce a crash trace and ask for a bug report, as currently
happens in assertions-enabled builds.

I've only added the verification for textual IR inputs. I don't
want to force verification for bitcode inputs, as these would
affect typical LTO scenarios, and are usually coming from Clang
itself.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Apr 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2025

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

We get a lot of issues that basically boil down to "I passed malformed LLVM IR to clang and it crashed". Clang does not perform IR verification by default in (non-assertion-enabled) release builds, and that's sensible for IR that Clang itself produces, which is expected to always be valid. However, if people pass in their own handwritten IR, we should report if it is malformed, instead of crashing. We should also report it in a way that does not produce a crash trace and ask for a bug report, as currently happens in assertions-enabled builds.

I've only added the verification for textual IR inputs. I don't want to force verification for bitcode inputs, as these would affect typical LTO scenarios, and are usually coming from Clang itself.


Full diff: https://github.com/llvm/llvm-project/pull/134396.diff

3 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticFrontendKinds.td (+2)
  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+14-1)
  • (added) clang/test/CodeGen/invalid_llvm_ir.ll (+10)
diff --git a/clang/include/clang/Basic/DiagnosticFrontendKinds.td b/clang/include/clang/Basic/DiagnosticFrontendKinds.td
index 5f64b1cbfac87..6c72775197823 100644
--- a/clang/include/clang/Basic/DiagnosticFrontendKinds.td
+++ b/clang/include/clang/Basic/DiagnosticFrontendKinds.td
@@ -379,6 +379,8 @@ def err_ast_action_on_llvm_ir : Error<
   "cannot apply AST actions to LLVM IR file '%0'">,
   DefaultFatal;
 
+def err_invalid_llvm_ir : Error<"invalid LLVM IR input: %0">;
+
 def err_os_unsupport_riscv_fmv : Error<
   "function multiversioning is currently only supported on Linux">;
 
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index 4321efd49af36..09d76a161079a 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -39,6 +39,7 @@
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/LLVMRemarkStreamer.h"
 #include "llvm/IR/Module.h"
+#include "llvm/IR/Verifier.h"
 #include "llvm/IRReader/IRReader.h"
 #include "llvm/LTO/LTOBackend.h"
 #include "llvm/Linker/Linker.h"
@@ -1048,8 +1049,20 @@ CodeGenAction::loadModule(MemoryBufferRef MBRef) {
 
   // Handle textual IR and bitcode file with one single module.
   llvm::SMDiagnostic Err;
-  if (std::unique_ptr<llvm::Module> M = parseIR(MBRef, Err, *VMContext))
+  if (std::unique_ptr<llvm::Module> M = parseIR(MBRef, Err, *VMContext)) {
+    // For textual LLVM IR files, always verify the input and report the error
+    // in a way that does not ask people to report an issue for it.
+    if (!llvm::isBitcode((const unsigned char *)MBRef.getBufferStart(),
+                         (const unsigned char *)MBRef.getBufferEnd())) {
+      std::string VerifierErr;
+      raw_string_ostream VerifierErrStream(VerifierErr);
+      if (llvm::verifyModule(*M, &VerifierErrStream)) {
+        CI.getDiagnostics().Report(diag::err_invalid_llvm_ir) << VerifierErr;
+        return {};
+      }
+    }
     return M;
+  }
 
   // If MBRef is a bitcode with multiple modules (e.g., -fsplit-lto-unit
   // output), place the extra modules (actually only one, a regular LTO module)
diff --git a/clang/test/CodeGen/invalid_llvm_ir.ll b/clang/test/CodeGen/invalid_llvm_ir.ll
new file mode 100644
index 0000000000000..573fe6e351f3e
--- /dev/null
+++ b/clang/test/CodeGen/invalid_llvm_ir.ll
@@ -0,0 +1,10 @@
+; RUN: not %clang %s 2>&1 | FileCheck %s
+
+; CHECK: error: invalid LLVM IR input: PHINode should have one entry for each predecessor of its parent basic block!
+; CHECK-NEXT: %phi = phi i32 [ 0, %entry ]
+
+define void @test() {
+entry:
+  %phi = phi i32 [ 0, %entry ]
+  ret void
+}

@efriedma-quic
Copy link
Collaborator

Might as well do it for bitcode too? That wouldn't impact LTO scenarios, I think: in LTO, the IR is directly loaded by lld/the LTO plugin.

@nikic
Copy link
Contributor Author

nikic commented Apr 4, 2025

Might as well do it for bitcode too? That wouldn't impact LTO scenarios, I think: in LTO, the IR is directly loaded by lld/the LTO plugin.

Yeah, good point. I've extended this to verify bitcode too, and checked that it doesn't affect LTO.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic merged commit 87a4215 into llvm:main Apr 7, 2025
11 checks passed
@nikic nikic deleted the clang-ir-verify branch April 7, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants