Skip to content

[CUDA] __global__ kernels pre-declared in header results into nv-link error in separable compilation (gpu-rdc) #118212

Open
@geomois

Description

@geomois

When enabling compilation with relocatable device code flag (--gpu-rdc), with cuda code with header and implementation files, I ran into a linking error:

nvlink error: Function A() not declared __global__ in all source files 
nvlink fatal: merge_elf failed

The error can be reproduced with the following set-up:

// bug.cuh
__global__ void foo();

// bug.cu
__global__ void foo() {}

// main.cu
#include "bug.cuh"

int main() { foo<<<1, 1, 1>>>(); }

>clang++ bug.cu main.cu --offload-arch=sm_89 -fgpu-rdc --offload-new-driver -lcudart

@jhuber6's insight:
Solution is to have .extern .entry instead of .extern .func

>clang++ -x cuda main.cu --offload-arch=sm_89 --offload-device-only -o - -S -fgpu-rdc

.version 8.5
.target sm_89
.address_size 64

.extern .func _Z3foov
()
;
.global .align 8 .u64 __clang_gpu_used_external[1] = {_Z3foov};

Workaround, enable -foffload-lto.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions