Description
Description
LLVM Trunk has added first support for OMPT callbacks for target directives. During testing, I noticed that callbacks for offloading to host are dispatched as well. While that's fine, the callbacks ompt_callback_device_initialize
, ompt_callback_device_load
, ompt_callback_device_unload
and ompt_callback_device_finalize
are not dispatched.
The callback ompt_callback_device_unload
is not implemented as far as I know, but I would expect the others to show up.
Missing ompt_callback_device_initialize
is against the OpenMP specifications, which state [Link]:
The OpenMP implementation invokes this callback after OpenMP is initialized for the device but before execution of any OpenMP construct is started on the device.
NVHPC 23.7 also dispatches ompt_callback_target
without ompt_callback_device_initialize
. However, in their case one can identify the host execution by checking the device_num
against the returned value of ompt_get_num_devices
which is either negative, or above ompt_get_num_devices
. In the case of LLVM, offloading to host seems to initialize four devices, which are handled just like offloading to GPUs. Therefore, we get normal device numbers. We can verify this by running llvm-omp-device-info
$ llvm-omp-device-info
Device (0):
Device Type Generic-elf-64bit
Device (1):
Device Type Generic-elf-64bit
Device (2):
Device Type Generic-elf-64bit
Device (3):
Device Type Generic-elf-64bit
Device (4):
CUDA Driver Version 12020
CUDA OpenMP Device Number 0
Device Name NVIDIA GeForce MX550
Global Memory Size 94779004878848 bytes
Number of Multiprocessors 16
Concurrent Copy and Execution Yes
Total Constant Memory 65536 bytes
Max Shared Memory per Block 49152 bytes
Registers per Block 65536
Warp Size 32
Maximum Threads per Block 1024
Maximum Block Dimensions
x 1024
y 1024
z 64
Maximum Grid Dimensions
x 2147483647
y 65535
z 65535
Maximum Memory Pitch 2147483647 bytes
Texture Alignment 512 bytes
Clock Rate 1320000 kHz
Execution Timeout No
Integrated Device No
Can Map Host Memory Yes
Compute Mode Default
Concurrent Kernels Yes
ECC Enabled No
Memory Clock Rate 6001000 kHz
Memory Bus Width 64 bits
L2 Cache Size 524288 bytes
Max Threads Per SMP 1024
Async Engines 3
Unified Addressing Yes
Managed Memory Yes
Concurrent Managed Memory Yes
Preemption Supported Yes
Cooperative Launch Yes
Multi-Device Boars No
Compute Capabilities sm_75
Reproducer
I used one of the aomp smoke tests veccopy-ompt-target-emi to verify this issue.
When compiling and running the code with the offload target nvptx64
, the following output can be seen:
$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 52ac71f92d38f75df5cb88e9c090ac5fd5a71548)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/software/software/LLVM/git/bin
$ clang -fopenmp -fopenmp-targets=nvptx64 test.c
$ ./a.out
Callback Init: device_num=0 type=sm_75 device=0x5593d65d44c0 lookup=0x7fef929e0480 doc=(nil)
Callback Load: device_num:0 filename:(null) host_adddr:0x5593d5e8e758 device_addr:(nil) bytes:758224
Callback Target EMI: kind=1 endpoint=1 device_num=0 task_data=0x5593d6591540 (0x0) target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) code=0x5593d5e8c9e2
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000002) src=0x7fff98ef29d0 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000002) src=0x7fff98ef29d0 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000003) src=0x7fff98ef29d0 src_device_num=1 dest=0x7fef60600000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000003) src=0x7fff98ef29d0 src_device_num=1 dest=0x7fef60600000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000004) src=0x7fff98ef1a30 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000004) src=0x7fff98ef1a30 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000005) src=0x7fff98ef1a30 src_device_num=1 dest=0x7fef60601000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000005) src=0x7fff98ef1a30 src_device_num=1 dest=0x7fef60601000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback Submit EMI: endpoint=1 req_num_teams=1 target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7a0 (0x0)
Callback Submit EMI: endpoint=2 req_num_teams=1 target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7a0 (0x0)
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000006) src=0x7fef60601000 src_device_num=0 dest=0x7fff98ef1a30 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000006) src=0x7fef60601000 src_device_num=0 dest=0x7fff98ef1a30 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000007) src=0x7fef60600000 src_device_num=0 dest=0x7fff98ef29d0 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000007) src=0x7fef60600000 src_device_num=0 dest=0x7fff98ef29d0 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000008) src=0x7fef60601000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000008) src=0x7fef60601000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000009) src=0x7fef60600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) host_op_id=0x7fef9282a7c0 (0x8000000000000009) src=0x7fef60600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback Target EMI: kind=1 endpoint=2 device_num=0 task_data=0x5593d6591540 (0x0) target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x8000000000000001) code=0x5593d5e8c9e2
Callback Target EMI: kind=1 endpoint=1 device_num=0 task_data=0x5593d6591540 (0x0) target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) code=0x5593d5e8cbfe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000b) src=0x7fff98ef29d0 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000b) src=0x7fff98ef29d0 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000c) src=0x7fff98ef29d0 src_device_num=1 dest=0x7fef60601000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000c) src=0x7fff98ef29d0 src_device_num=1 dest=0x7fef60601000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000d) src=0x7fff98ef1a30 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000d) src=0x7fff98ef1a30 src_device_num=1 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fef928eb383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000e) src=0x7fff98ef1a30 src_device_num=1 dest=0x7fef60600000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000e) src=0x7fff98ef1a30 src_device_num=1 dest=0x7fef60600000 dest_device_num=0 bytes=4000 code=0x7fef928eb2fe
Callback Submit EMI: endpoint=1 req_num_teams=0 target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7a0 (0x0)
Callback Submit EMI: endpoint=2 req_num_teams=0 target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7a0 (0x0)
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000f) src=0x7fef60600000 src_device_num=0 dest=0x7fff98ef1a30 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x800000000000000f) src=0x7fef60600000 src_device_num=0 dest=0x7fff98ef1a30 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000010) src=0x7fef60601000 src_device_num=0 dest=0x7fff98ef29d0 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000010) src=0x7fef60601000 src_device_num=0 dest=0x7fff98ef29d0 dest_device_num=1 bytes=4000 code=0x7fef928f457f
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000011) src=0x7fef60600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000011) src=0x7fef60600000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000012) src=0x7fef60601000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) host_op_id=0x7fef9282a7c0 (0x8000000000000012) src=0x7fef60601000 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fef928ec73a
Callback Target EMI: kind=1 endpoint=2 device_num=0 task_data=0x5593d6591540 (0x0) target_task_data=0x5593d65ba818 (0x0) target_data=0x7fef9282a7a8 (0x800000000000000a) code=0x5593d5e8cbfe
Success
Callback Fini: device_num=0
Replacing nvptx64
with x86_64
, one can see the follwing:
$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 52ac71f92d38f75df5cb88e9c090ac5fd5a71548)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang -fopenmp -fopenmp-targets=x86_64 test.c
$ ./a.out
Callback Target EMI: kind=1 endpoint=1 device_num=0 task_data=0x55a83389d540 (0x0) target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) code=0x55a8336389e2
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000002) src=0x7fffc2fe7820 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000002) src=0x7fffc2fe7820 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000003) src=0x7fffc2fe7820 src_device_num=4 dest=0x55a8338f82d0 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000003) src=0x7fffc2fe7820 src_device_num=4 dest=0x55a8338f82d0 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000004) src=0x7fffc2fe6880 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000004) src=0x7fffc2fe6880 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000005) src=0x7fffc2fe6880 src_device_num=4 dest=0x55a833902680 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000005) src=0x7fffc2fe6880 src_device_num=4 dest=0x55a833902680 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback Submit EMI: endpoint=1 req_num_teams=1 target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287a0 (0x0)
Callback Submit EMI: endpoint=2 req_num_teams=1 target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287a0 (0x0)
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000006) src=0x55a833902680 src_device_num=0 dest=0x7fffc2fe6880 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000006) src=0x55a833902680 src_device_num=0 dest=0x7fffc2fe6880 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000007) src=0x55a8338f82d0 src_device_num=0 dest=0x7fffc2fe7820 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000007) src=0x55a8338f82d0 src_device_num=0 dest=0x7fffc2fe7820 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000008) src=0x55a833902680 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000008) src=0x55a833902680 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000009) src=0x55a8338f82d0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) host_op_id=0x7fa2632287c0 (0x8000000000000009) src=0x55a8338f82d0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback Target EMI: kind=1 endpoint=2 device_num=0 task_data=0x55a83389d540 (0x0) target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x8000000000000001) code=0x55a8336389e2
Callback Target EMI: kind=1 endpoint=1 device_num=0 task_data=0x55a83389d540 (0x0) target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) code=0x55a833638bfe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000b) src=0x7fffc2fe7820 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000b) src=0x7fffc2fe7820 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000c) src=0x7fffc2fe7820 src_device_num=4 dest=0x55a833902680 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000c) src=0x7fffc2fe7820 src_device_num=4 dest=0x55a833902680 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=1 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000d) src=0x7fffc2fe6880 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=2 optype=1 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000d) src=0x7fffc2fe6880 src_device_num=4 dest=(nil) dest_device_num=0 bytes=4000 code=0x7fa26336f383
Callback DataOp EMI: endpoint=1 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000e) src=0x7fffc2fe6880 src_device_num=4 dest=0x55a8338f82d0 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback DataOp EMI: endpoint=2 optype=2 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000e) src=0x7fffc2fe6880 src_device_num=4 dest=0x55a8338f82d0 dest_device_num=0 bytes=4000 code=0x7fa26336f2fe
Callback Submit EMI: endpoint=1 req_num_teams=0 target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287a0 (0x0)
Callback Submit EMI: endpoint=2 req_num_teams=0 target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287a0 (0x0)
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000f) src=0x55a8338f82d0 src_device_num=0 dest=0x7fffc2fe6880 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x800000000000000f) src=0x55a8338f82d0 src_device_num=0 dest=0x7fffc2fe6880 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=1 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000010) src=0x55a833902680 src_device_num=0 dest=0x7fffc2fe7820 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=2 optype=3 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000010) src=0x55a833902680 src_device_num=0 dest=0x7fffc2fe7820 dest_device_num=4 bytes=4000 code=0x7fa26337857f
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000011) src=0x55a8338f82d0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000011) src=0x55a8338f82d0 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=1 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000012) src=0x55a833902680 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback DataOp EMI: endpoint=2 optype=4 target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) host_op_id=0x7fa2632287c0 (0x8000000000000012) src=0x55a833902680 src_device_num=0 dest=(nil) dest_device_num=-1 bytes=0 code=0x7fa26337073a
Callback Target EMI: kind=1 endpoint=2 device_num=0 task_data=0x55a83389d540 (0x0) target_task_data=0x55a8338c6818 (0x0) target_data=0x7fa2632287a8 (0x800000000000000a) code=0x55a833638bfe
Success
Notice, that the callbacks are missing from the output.
Side question:
I noticed that omp_get_num_devices()
returns a number of four for offloading to x86_64
. What's the reasoning behind that? llvm-omp-device-info
also shows four devices with the type Generif-elf-64bit
on a system with Ubuntu 22.04 LTS, Intel Core i7-1260P.