[CUDA] __global__ kernels pre-declared in header results into nv-link error in separable compilation (gpu-rdc)

When enabling compilation with relocatable device code flag (--gpu-rdc), with cuda code with header and implementation files, I ran into a linking error:
```
nvlink error: Function A() not declared __global__ in all source files 
nvlink fatal: merge_elf failed
```   

The error can be reproduced with the following set-up:
```
// bug.cuh
__global__ void foo();

// bug.cu
__global__ void foo() {}

// main.cu
#include "bug.cuh"

int main() { foo<<<1, 1, 1>>>(); }

>clang++ bug.cu main.cu --offload-arch=sm_89 -fgpu-rdc --offload-new-driver -lcudart
```

@jhuber6's insight:
Solution is to have .extern .entry instead of .extern .func
  ```
  >clang++ -x cuda main.cu --offload-arch=sm_89 --offload-device-only -o - -S -fgpu-rdc
  
  .version 8.5
  .target sm_89
  .address_size 64
  
  .extern .func _Z3foov
  ()
  ;
  .global .align 8 .u64 __clang_gpu_used_external[1] = {_Z3foov};
  ```


Workaround, enable -foffload-lto. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] global kernels pre-declared in header results into nv-link error in separable compilation (gpu-rdc) #118212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDA] __global__ kernels pre-declared in header results into nv-link error in separable compilation (gpu-rdc) #118212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[CUDA] global kernels pre-declared in header results into nv-link error in separable compilation (gpu-rdc) #118212