Skip to content

[mlir][gpu] Add the OffloadEmbeddingAttr offloading translation attr #78117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fabianmcg
Copy link
Contributor

This patch adds the offloading translation attribute. This attribute uses LLVM
offloading infrastructure to embed GPU binaries in the IR. At the program start,
the LLVM offloading mechanism registers kernels and variables with the runtime
library: CUDA RT, HIP RT, or LibOMPTarget.

The offloading mechanism relies on the runtime library to dispatch the correct
kernel based on the registered symbols.

This patch is 3/4 on introducing the OffloadEmbeddingAttr GPU translation
attribute.

Note: Ignore the base commits; those are being reviewed in PRs #78057, #78098,
and #78073.

This patch adds the TargetInfo attribute interface to the set of DLTI
interfaces. Target information attributes provide essential information on the
compilation target. This information includes the target triple identifier, the
target chip identifier, and a string representation of the target features.

This patch also adds this new interface to the NVVM and ROCDL GPU target
attributes.
This patch adds the `OffloadHandler` utility class for creating LLVM offload
entries.
LLVM offload entries hold information on offload symbols; for example, for a
GPU kernel, this includes its host address to identify the kernel and the kernel
identifier in the binary. Arrays of offload entries can be used to register
functions within the CUDA/HIP runtime. Libomptarget also uses these entries to
register OMP target offload kernels and variables.

This patch is 1/4 on introducing the `OffloadEmbeddingAttr` GPU translation
attribute.
This patch adds the offloading translation attribute. This attribute uses LLVM
offloading infrastructure to embed GPU binaries in the IR. At the program start,
the LLVM offloading mechanism registers kernels and variables with the runtime
library: CUDA RT, HIP RT, or LibOMPTarget.

The offloading mechanism relies on the runtime library to dispatch the correct
kernel based on the registered symbols.

This patch is 3/4 on introducing the OffloadEmbeddingAttr GPU translation
attribute.

Note: Ignore the base commits; those are being reviewed in PRs llvm#78057, llvm#78098,
and llvm#78073.
@@ -0,0 +1,61 @@
//===- Offload.h - LLVM Target Offload --------------------------*- C++ -*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is doing the same kind of work that the OffloadInfoManager is doing. Would it be possible to refactor that to not have to add this class?

Copy link
Contributor Author

@fabianmcg fabianmcg Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the OffloadInfoManager creates the entries and adds them to the omp_offloading_entries section. However, the OffloadInfoManager performs no explicit construction of the entry array needed by the binary descriptor. It's the linker's job to implicitly create the array using all the entries in the section.

The problem with this approach is that LLJIT doesn't handle the implicit creation of the array very well. To overcome this limitation of LLJIT, the attribute constructs the entry array explicitly.

In summary, this class can be removed up to an extent, but then JIT compilation is impossible, and a real linker is needed to obtain the final executable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like Clang isn't able to be used with the LLJIT in that case, or if it does, then there is already a solution in Clang. I think making this work both for Clang and MLIR would be useful. If there is already a solution in Clang then it should be migrated to the OpenMPIRBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real problem is the lack of comprehensive support of linker sections in LLJIT, so I wouldn't say clang, or the clang-linker-wrapper are at fault. The easiest solution that I found was complying with LLJIT.
I think @jhuber6 was looking into changing the registration mechanism of LibOMPTarget binaries, so maybe we can found a solution that works for all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the full view of what LLJIT does here, but the use-case in clang is that we need each TU to be able to emit values that need to be registered by the runtime. There are a few alternate solutions to this, but having the linker handle it is the best overall. The rework I was talking about was to simply change the offloading entry struct so it's more generic.

How does LLJIT work exactly? If you put globals into a section they will generally appear in order, so if you had a pointer to the first and last globals in that section you could just traverse it once it's gone through the backend. This is somewhat similar to the COFF linker handling which just gives an object at the beginning and end of the others in that section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few alternate solutions to this, but having the linker handle it is the best overall.

I agree, I think the best solution would be to make LLJIT work.

The rework I was talking about was to simply change the offloading entry struct so it's more generic.

I see.

How does LLJIT work exactly?

Honestly, I'm not 100% sure, I only know that the same IR would work if linked with a regular linker and fail with LLJIT.
I asked around on LLJIT discord a couple months ago why it was not picking out the symbols and they didn't give an answer.

I'll inquire further with them and comeback with a more definitive answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants