[OMPT] Overlapping `device_num` when using multiple offloading architectures

## Description

This issue was initially found during discussion of #64487. However, this should be handled as a separate issue, especially since the other one is closed now. This issue doesn't affect me directly, but there are users which might run into this issue, especially in mixed GPU systems. 

When a user utilizes multiple architectures to offload to in his program (for example `-fopenmp-targets=x86_64,nvptx64`, the OMPT interface will receive callbacks for `ompt_callback_device_initialize` for each device used. While this works fine, we may encounter a case where the transferred device number for two different architectures overlap, causing issues for tools to differentiate the devices. 

Users can select their device by the device number, for example
```c
int main( void )
{
// x86_64 (four devices)
#pragma omp target device(0)
{...}
// nvptx64 (other devices)
#pragma omp target device(4)
{...}
}
```

This would translate to the following in the OMPT interface 
```console
Callback Init: device_num=0 type=generic-64bit
[...]
Callback Init: device_num=0 type=sm_80 
```

## Reproducer

This small code should show the issue. I have tested it with with `nvptx64 + x86_64` and `nvptx64 + amdgcn-amd-amdhsa`, but it should also affect other combinations

```c
#include <assert.h>
#include <stdio.h>
#include <omp-tools.h>


void callback_ompt_device_initialize(int device_num,
                                     const char *type,
                                     ompt_device_t *device,
                                     ompt_function_lookup_t lookup,
                                     const char *documentation)
{
    printf("[%s] device_num = %d | type = %s\n", __FUNCTION__, device_num, type);
}

static int
initialize_tool( ompt_function_lookup_t lookup,
                 int                    initialDeviceNum,
                 ompt_data_t*           toolData )
{
    ompt_set_callback_t set_callback =
        ( ompt_set_callback_t )lookup( "ompt_set_callback" );
    assert( set_callback != 0 );

    ompt_set_result_t registration_result = set_callback(ompt_callback_device_initialize, (ompt_callback_t) &callback_ompt_device_initialize);
    assert(registration_result == ompt_set_always);

    return 1;
}

static void
finalize_tool( ompt_data_t* toolData )
{
}
	
ompt_start_tool_result_t*
ompt_start_tool( unsigned int omp_version, /* == _OPENMP */
                 const char*  runtime_version )
{
    static ompt_start_tool_result_t tool = { &initialize_tool,
                                             &finalize_tool,
                                             ompt_data_none };
    return &tool;
}


int main(void) {
    #pragma omp target device(0)
    {}

    #pragma omp target device(4)
    {}
}
```

Running the tool on a system with a single NVIDIA MX550, we can see the following output:
```console
$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 4b383107fa7585bb5ecd7f03cab7800b33d1585a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/git/bin
$ clang -fopenmp -fopenmp-targets=x86_64,nvptx64 reproducer.c
$ ./a.out
[callback_ompt_device_initialize] device_num = 0 | type = generic-64bit
[callback_ompt_device_initialize] device_num = 0 | type = sm_75
```

## Previous discussion

Here are a few related comments by @mhalk and me regarding this issue:

https://github.com/llvm/llvm-project/issues/64487#issuecomment-1673631414
> Let's say we would go with "added support": I added prints for `omp_get_device_num` and `omp_get_num_devices`. Also forced the target regions to use device number 3 (and 11, for the two-target-version).
> 
> Output for `x86_64` only:
> 
> ```
> trunk/bin/clang -fopenmp -fopenmp-targets=x86_64 veccopy-ompt-target.c -o veccopy-ompt-target
> ./veccopy-ompt-target
> 
> omp_get_device_num=4
> omp_get_num_devices=4
> 
> Callback Init: device_num=3 type=unknown device=0x1485000 lookup=0x7f0bd39c8a90 doc=(nil)
> Callback Load: device_num:3 module_id:0 filename:(null) host_adddr:0x200368 device_addr:(nil) bytes:17112
> [...]
> Callback Fini: device_num=3
> ```
> 
> Output for `x86_64` and `amdgcn-amd-amdhsa` (8 gfx90a GPUs present):
> 
> ```
> trunk/bin/clang -fopenmp -fopenmp-targets=x86_64,amdgcn-amd-amdhsa veccopy-ompt-target.c -o veccopy-ompt-target
> ./veccopy-ompt-target
> 
> omp_get_device_num=12
> omp_get_num_devices=12
> 
> Callback Init: device_num=3 type=gfx90a device=0x12ae980 lookup=0x7f441a241a90 doc=(nil)
> Callback Load: device_num:3 module_id:0 filename:(null) host_adddr:0x200378 device_addr:(nil) bytes:19208
> [...]
> Callback Init: device_num=3 type=unknown device=0x12b52a0 lookup=0x7f441a241a90 doc=(nil)
> Callback Load: device_num:3 module_id:0 filename:(null) host_adddr:0x204f00 device_addr:(nil) bytes:17112
> [...]
> Callback Fini: device_num=3
> Callback Fini: device_num=3
> ```
> 
> So, in the sources, I intentionally used devices 3 and 11 for target regions. Which both correlate to device number 3 in their respective RTL. That is, to illustrate that there might be ambiguous output w.r.t. `Fini` callbacks.


https://github.com/llvm/llvm-project/issues/64487#issuecomment-1674537696
> > Judging by your output for both `x86_64` and a combination of `x86_64` + `amdgcn`, I would expect that `nvptx64` + `amdgcn` yields the same results shown in your second example. I _think_ that I can test this. For tool developers, the ambiguous output can be hard to work with, since we can only use the `device_num` to identify the executing device during target callbacks. The runtime should probably dispatch the same device number users can use to define the executing device (here `3` for the AMD GPU and `11` for the host).
> 
> I checked how the device numbers are delivered on a system with an AMD + NVIDIA GPU. We can see the same behavior:
> 
> ```
> $ clang-18 -fopenmp -fopenmp-targets=nvptx64,amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a veccopy.c
> $ ./a.out
> Num devices = 3
> Device 0
> Callback Init: device_num=0 type=sm_80 device=0x556d3f652a90 lookup=0x7efd6a8027b0 doc=(nil)
> Callback Load: device_num:0 filename:(null) host_adddr:0x556d3ed12778 device_addr:(nil) bytes:715888
> [...]
> Device 1
> Callback Init: device_num=0 type=gfx90a device=0x556d402203d0 lookup=0x7efd6a8027b0 doc=(nil)
> Callback Load: device_num:0 filename:(null) host_adddr:0x556d3edc1478 device_addr:(nil) bytes:25064
> [...]
> Device 2
> Callback Init: device_num=1 type=gfx90a device=0x556d40229210 lookup=0x7efd6a8027b0 doc=(nil)
> Callback Load: device_num:1 filename:(null) host_adddr:0x556d3edc1478 device_addr:(nil) bytes:25064
> [...]
> Success
> Callback Fini: device_num=0
> Callback Fini: device_num=0
> Callback Fini: device_num=1
> ```




https://github.com/llvm/llvm-project/issues/64487#issuecomment-1674568211
> > The runtime should probably dispatch the same device number users can use to define the executing device [...]
> 
> tl;dr: Agreed. Like that idea, albeit I'd have to see if this is possible. Maybe I'm not aware of some information but ATM I'm not very confident this can be (reasonably) solved on my end.
> 
> When the callbacks are dispatched, we only have the information from the corresponding RTL. To adhere to our example, that is e.g. `DeviceId=3` -- while in the OpenMP runtime this might in fact be device number 11. Now the callback is executed, that's it -- for the init callback we could (during execution of the corresponding callback!) find out the "actual" (whatever that means) device number since we have the `kind` (amdgcn, nvptx, ...) and might be able to deduce further info. But other callbacks do not have this information.
> 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OMPT] Overlapping `device_num` when using multiple offloading architectures #65104

Description

Reproducer

Previous discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[OMPT] Overlapping device_num when using multiple offloading architectures #65104

Description

Description

Reproducer

Previous discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[OMPT] Overlapping `device_num` when using multiple offloading architectures #65104