Description
Basically, the MergeFunctions LLVM pass can rewrite functions to generate calls that are not valid under the calling convention of the target, e.g. extern "ptx-kernel"
functions should not call other extern "ptx-kernel"
functions in NVPTX.
This is an LLVM bug, described here (thanks @nikic): https://bugs.llvm.org/show_bug.cgi?id=40232. A PR also adds a target option and a -Z flag to control MergeFunctions: #57268.
Example: in the following Rust source, the functions foo
and bar
get merged by MergeFunctions:
#![crate_type = "lib"]
#![feature(abi_ptx)]
#![feature(lang_items)]
#![feature(link_llvm_intrinsics)]
#![feature(naked_functions)]
#![feature(no_core)]
#![no_core]
#[lang = "sized"]
trait Sized {}
#[lang = "copy"]
trait Copy {}
#[allow(improper_ctypes)]
extern "C" {
#[link_name = "llvm.nvvm.barrier0"]
fn syncthreads() -> ();
}
#[inline]
pub unsafe fn _syncthreads() -> () {
syncthreads()
}
#[no_mangle]
pub unsafe extern "ptx-kernel" fn foo() {
_syncthreads();
_syncthreads();
}
#[no_mangle]
pub unsafe extern "ptx-kernel" fn bar() {
_syncthreads();
_syncthreads();
}
to yield the incorrect PTX assembly, as the call.uni bar
instruction is not valid since a kernel is calling another kernel (note this requires rustc -Z merge-functions=trampolines
from the above PR):
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_35
.address_size 64
// .globl bar // -- Begin function bar
// @bar
.visible .entry bar()
{
// %bb.0: // %start
bar.sync 0;
bar.sync 0;
ret;
// -- End function
}
// .globl foo // -- Begin function foo
.visible .entry foo() // @foo
{
// %bb.0:
{ // callseq 0, 0
.reg .b32 temp_param_reg;
// XXX: `call.uni bar` is not a valid call!
call.uni
bar,
(
);
} // callseq 0
ret;
// -- End function
}
Disabling MergeFunctions (e.g. using rustc -Z merge-functions=disabled
) yields correct PTX assembly:
//
// Generated by LLVM NVPTX Back-End
//
.version 3.2
.target sm_35
.address_size 64
// .globl foo // -- Begin function foo
// @foo
.visible .entry foo()
{
// %bb.0: // %start
bar.sync 0;
bar.sync 0;
ret;
// -- End function
}
// .globl bar // -- Begin function bar
.visible .entry bar() // @bar
{
// %bb.0: // %start
bar.sync 0;
bar.sync 0;
ret;
// -- End function
}
P.S. Currently the default operation of MergeFunctions is to emit function aliases which are not supported by NVPTX, so controlling MergeFunctions via the merge-functions
flag is necessary to generate any of the PTX assembly above.
Meta
I'm on a patched rustc so this may not be so helpful, but here it is anyway:
rustc --version --verbose
:
rustc 1.33.0-nightly (fb86d604b 2018-12-27)
binary: rustc
commit-hash: fb86d604bf65c3becd16180b56267a329cf268d5
commit-date: 2018-12-27
host: x86_64-unknown-linux-gnu
release: 1.33.0-nightly
LLVM version: 8.0