Skip to content

[MLIR] Enabling Intel GPU Integration. #65539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

Conversation

silee2
Copy link
Contributor

@silee2 silee2 commented Sep 6, 2023

MLIR currently doesn't have a gpu runtime for Intel GPU.
This PR adds components required to enable lowering from GPU dialect and execute on Intel GPU including a new Sycl runtime wrapper.

List of changes:

  • Add new runtime wrapper library mlir_sycl_runtime that is used along with gpu-to-llvm pass
  • Update gpu-to-llvm pass to support host shared memory allocation and address limitations from using spirv binary blob. Spirv blob does not have size embedded and size information has to be passed separately. Also number of parameters need to be passed as an additional argument then doing a kernel launch.
  • Add new bool pass option "use-opencl" to convert-gpu-to-spirv pass for OpenCL
  • Add gpu-serialize-to-spirv pass that wraps spirv serializer as a pass
  • Add cmake modules, FindSyclRuntime.cmake and FindLevelZero.cmake that helps find sycl runtime and level zero runtime. They are required for building sycl runtime wrapper.
  • Add unit test for serializer pass
  • Add GPU/Sycl Integration test

This PR is monolithic for now and will be broken down to smaller pieces as needed.

Integration workflow:
GPU dialect lowering and execution workflow for Intel GPU added by this PR works as follows. It is similar to CUDA and ROCm workflow but uses spirv dialect for device code IR instead of llvm dialect.

Device code is lowered from gpu.func to spirv dialect using existing convert-gpu-to-spirv pass.
Then gpu-serialize-to-spirv pass is called to serialize spirv and attach serialized binary blob as an attribute to gpu.module
Host code is lowered by gpu-to-llvm pass and gpu dialect ops are converted in to llvm calls to various entries in the sycl runtime wrapper.
The resulting lowered code from both host and device code is now entirely in LLVM dialect and that is piped into mlir-cpu-runner for JIT execution with sycl runtime wrapper loaded as a shared library.

Limitations:
Sycl runtime wrapper in the PR is an initial version and does not implement all entries supported by CUDA and ROCm.
Intel GPU's share local memory is not supported.
Bare pointers are used for passing parameters to device kernels and lacks support for handling dynamic shaped memrefs.

@github-actions github-actions bot added mlir:core MLIR Core Infrastructure mlir:gpu mlir labels Sep 6, 2023
@Jianhui-Li
Copy link

@rengolin @joker-eph @Hardcode84

FYI that this PR enables the current GPU dialect on Intel GPU as is, without introducing stream/queue to the current GPU dialect.

@Hardcode84
Copy link
Contributor

I suggest to extract mgpu interface changes and serializetoSpirv pass to 2 separate PRs.

@silee2
Copy link
Contributor Author

silee2 commented Sep 6, 2023

I suggest to extract mgpu interface changes and serializetoSpirv pass to 2 separate PRs.

Agree. And the changes are originally from different authors so should be splitted.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, this should likely get reviewed by @antiagainst / @kuhar ; and it's be great if you can split the independent changes and send them ahead of the e2e integration.

return spvModName->consume_front("__spv__") && spvModName == name;
};
auto spvMods = mod.getOps<spirv::ModuleOp>();
auto it = llvm::find_if(spvMods, isSameMod);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really costly, can you build a symbol table once at the beginning of the function and use it for the queries instead?

spvMod->erase();
}
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabianmcg : how would that fit in the new serialization flow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GpuSerializeToSPIRVPass can definitely be integrated as an SPIR-V target attribute. Basically GpuSerializeToSPIRVPass::run would be the body of SPIRVTargetAttr::serializeToObject. However, we might need to modify the signature of serializeToObject to include a symbol table to make the SPIR-V serialization efficient, then serializeToObject could query the SPIR-V module name directly.

With that, gpu-module-to-binary would be able to take care of the serialization like with NVIDIA & AMD.

Then we also need to modify SelectObjectAttr. I think the only modification needed is adding the extra parameters in mgpuModuleLoad, etc. Another option is creating a dedicated SPIRVObjectAttr but so far I don't think it's needed.

Overall I think those would be the only 2 changes we would need to do to integrate this patch into the new workflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! @silee2 : can you adjust in this direction? I'm sure Fabian can support you if you have more questions!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@silee2 here are the steps:

  1. Implement a target attribute, see for example: NVVMTargetAttr. The idea of this attribute is to hold properties intrinsic to the target, like triple, chip, flags, etc.
  2. Add a pass to attach the target to a module, see: GpuNVVMAttachTarget and Dialect/GPU/Transforms/NVVMAttachTarget.cpp. The idea of this pass is to attach the SPIRV target to GPU modules, so it must know how to create them.
  3. We're currently implementing TargetAttrs as external models to keep libraries separated, see NVVM/Target.cpp, so GpuSerializeToSPIRVPass::run would be there.
  4. Modify getModuleLoadFn & createKernelLaunch appropriately in SelectObjectAttr.cpp#L125-L15 instead of adding the changes in GPUToLLVMCommon.
  5. Then the compilation workflow should look something similar to this: GPU: Compilation Overview

I'll take care of adding a pointer to the top module symbol table so it can be used be the SPIRVTarget.

If you have any questions just ping me in discord or discourse @fabianmc.

@@ -71,7 +71,8 @@ void GPUToSPIRVPass::runOnOperation() {
std::unique_ptr<ConversionTarget> target =
spirv::getMemorySpaceToStorageClassTarget(*context);
spirv::MemorySpaceToStorageClassMap memorySpaceMap =
spirv::mapMemorySpaceToVulkanStorageClass;
this->useOpenCL ? spirv::mapMemorySpaceToOpenCLStorageClass :
spirv::mapMemorySpaceToVulkanStorageClass;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point out which unit-test is covering this? Can you send this in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #66445 which includes the code change and unit-test for covering.

@@ -1158,7 +1178,7 @@ LogicalResult ConvertLaunchFuncOpToGpuRuntimeCallPattern::matchAndRewrite(
{function.getResult(), adaptor.getGridSizeX(), adaptor.getGridSizeY(),
adaptor.getGridSizeZ(), adaptor.getBlockSizeX(), adaptor.getBlockSizeY(),
adaptor.getBlockSizeZ(), dynamicSharedMemorySize, stream, kernelParams,
/*extra=*/nullpointer});
/*extra=*/nullpointer, paramsCount});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a unit-test for this: but can you send everything related, but can you send everything related to paramsCount in a separate PR?

Value allocatedPtr =
allocCallBuilder.create(loc, rewriter, {sizeBytes, stream}).getResult();
allocCallBuilder.create(loc, rewriter, {sizeBytes, stream, isHostShared})
.getResult();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a unit-test for this: but can you send everything related to isHostShared in a separate PR?

Copy link
Member

@grypp grypp Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding host_shared, I noticed this code in the examples:

%memref, %asyncToken = gpu.alloc async [%0] host_shared (): memref<3x3xi64>

Can SYCL's runtime allocate host_shared data asynchronously? It might be a good idea to prevent the use of host_shared and async together. FWIW, CUDA and HIP cannot do that. As far as I can see from the PR, the async queue is not used in sycl runtime when allocating host_shared. I am guessing there is no async allocation here as well.

Nonetheless, having async on gpu.alloc is perfectly acceptable. CUDA does support asynchronous device memory allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, SYCL provides a more abstract memory management with sycl::buffer and sycl::accessor defining an implicit asynchronous task graph. The allocation details are left to the implementation, asynchronous or synchronous allocation is left to the implementers.
Here the lower-level synchronous USM memory management API of SYCL is used instead, similar to CUDA/HIP memory management.
So, should the async allocation in the example be synchronous instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, SYCL provides a more abstract memory management with sycl::buffer and sycl::accessor defining an implicit asynchronous task graph. The allocation details are left to the implementation, asynchronous or synchronous allocation is left to the implementers.

I haven't touched SYCL much, thanks for the explanation. Creating a task graph implicitly sounds interesting. In this case, SYCL users are ware of asynchrony while writing their program. In CUDA (or HIP), users choose sync or async execution.

Here the lower-level synchronous USM memory management API of SYCL is used instead, similar to CUDA/HIP memory management.

Yes that's correct. I don't think there is an USM that can do allocation asynchronously.

So, should the async allocation in the example be synchronous instead?

Yes, I think this is the correct behaviour. We can disallow host_shared and async on the Op.

Here are the possible IRs:

// Valid
%memref = gpu.alloc host_shared (): memref<3x3xi64>

// Valid
%memref = gpu.alloc (): memref<3x3xi64>

// Invalid, USM managers don't allocate async  
%memref, %asyncToken = gpu.alloc async [%0] host_shared (): memref<3x3xi64>

// Valid, only for CUDA. Afaik, SYCL or HIP cannot do that
%memref, %asyncToken = gpu.alloc async [%0] (): memref<3x3xi64>

Copy link
Contributor

@nbpatel nbpatel Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upstream GPUToLLVMConversion lowering does not support lowering of gpu.alloc which is not async.
https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp#L797

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that if the runtime uses actually synchronous allocation behind the scene and produces an always-ready async token, it works, even if non optimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upstream GPUToLLVMConversion lowering does not support lowering of gpu.alloc which is not async.

Would that work if omit that check when host_shared is present?

@jdoerfert jdoerfert changed the title Enabling Intel GPU Integration. [MLIR] Enabling Intel GPU Integration. Sep 7, 2023
Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite interesting!
At some point it would be nice to have some design document or documentation somewhere explaining how all these MLIR runners works, including this one.
Globally this PR add a SYCL runner, but it is very specific for Intel Level 0.
It would be nice to have in the future some generalization, like SYCL using OpenCL interoperability interface to run the SPIR-V kernels or even native kernels.

@@ -116,6 +116,7 @@ add_definitions(-DMLIR_ROCM_CONVERSIONS_ENABLED=${MLIR_ENABLE_ROCM_CONVERSIONS})

set(MLIR_ENABLE_CUDA_RUNNER 0 CACHE BOOL "Enable building the mlir CUDA runner")
set(MLIR_ENABLE_ROCM_RUNNER 0 CACHE BOOL "Enable building the mlir ROCm runner")
set(MLIR_ENABLE_SYCL_RUNNER 0 CACHE BOOL "Enable building the mlir Sycl runner")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell SYCL correctly.

Suggested change
set(MLIR_ENABLE_SYCL_RUNNER 0 CACHE BOOL "Enable building the mlir Sycl runner")
set(MLIR_ENABLE_SYCL_RUNNER 0 CACHE BOOL "Enable building the mlir SYCL runner")

One could argue that mlir should be spelled MLIR but the train seems to have left long time ago. :-)


// Create global device and context
sycl::device syclDevice = getDefaultDevice();
sycl::context syclContext = sycl::context(syclDevice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sycl::context syclContext = sycl::context(syclDevice);
sycl::context syclContext { syclDevice };


auto kernel = sycl::make_kernel<sycl::backend::ext_oneapi_level_zero>(
{kernelBundle, zeKernel}, syclContext);
syclKernel = new sycl::kernel(kernel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this dynamic memory allocation?
Most of the SYCL objects have already this reference semantics already: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reference-semantics

Comment on lines +156 to +157
sycl::nd_range<3> syclNdRange(
sycl::nd_range<3>(syclGlobalRange, syclLocalRange));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sycl::nd_range<3> syclNdRange(
sycl::nd_range<3>(syclGlobalRange, syclLocalRange));
sycl::nd_range<3> syclNdRange(syclGlobalRange, syclLocalRange);

return syclKernel;
}

static void launchKernel(QUEUE *queue, sycl::kernel *kernel, size_t gridX,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static void launchKernel(QUEUE *queue, sycl::kernel *kernel, size_t gridX,
static void launchKernel(QUEUE queue, sycl::kernel kernel, size_t gridX,

or even use & if you are afraid of using the reference semantics of SYCL behind the scene.

size_t blockY, size_t blockZ, size_t sharedMemBytes,
void **params, size_t paramsCount) {
auto syclGlobalRange =
::sycl::range<3>(blockZ * gridZ, blockY * gridY, blockX * gridX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using sometime ::sycl and sycl?


#pragma clang diagnostic pop

struct QUEUE {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this spelling? Coding standard?
Why do you need this object? At the end this looks like a std::optional<sycl::queue>.

@@ -187,4 +187,8 @@ def GpuROCDLAttachTarget: Pass<"rocdl-attach-target", ""> {
];
}

def GpuSerializeToSPIRVPass : Pass<"gpu-serialize-to-spirv", "ModuleOp"> {
let summary = "Serialize spirv dialect to spirv binary";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let summary = "Serialize spirv dialect to spirv binary";
let summary = "Serialize spirv dialect to SPIR-Vbinary";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space?

#
# Example usage:
#
# find_package(SyclRuntime)
Copy link
Contributor

@keryell keryell Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be

Suggested change
# find_package(SyclRuntime)
# find_package(SYCLRuntime)

everywhere?

auto spvMod = *it;

spvBinary.clear();
// serialize the spv module to spv binary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// serialize the spv module to spv binary
// serialize the SPIR-V module to SPIR-V binary

everywhere.

@rengolin
Copy link
Member

rengolin commented Sep 7, 2023

At some point it would be nice to have some design document or documentation somewhere explaining how all these MLIR runners works, including this one.

The idea is to eventually consolidate all runners into one. This PR is just another piece of the puzzle.

Once we're all happy with how the runners work, we should common them up using command line options to select the "type" and CMake options to enable particular runner types (depending on the runtimes and hardware available).

Globally this PR add a SYCL runner, but it is very specific for Intel Level 0. It would be nice to have in the future some generalization, like SYCL using OpenCL interoperability interface to run the SPIR-V kernels or even native kernels.

Agreed! The SYCL runtime here is just being used to abstract the LevelZero calls, but this work will be helpful when adding a full SYCL runner (actual language extensions and libraries) to other CPUs/GPUs later.

@rengolin
Copy link
Member

rengolin commented Sep 7, 2023

CI failure looks like Buildkite issue?

$ /etc/buildkite-agent/hooks/pre-checkout
--
  | BUILDKITE_REPO: https://github.com/llvm/llvm-project.git
  | fatal: not a git repository (or any parent up to mount point /var/lib)
  | Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  | 🚨 Error: The global pre-checkout hook exited with status 128

@Jianhui-Li
Copy link

At some point it would be nice to have some design document or documentation somewhere explaining how all these MLIR runners works, including this one.

The idea is to eventually consolidate all runners into one. This PR is just another piece of the puzzle.

Once we're all happy with how the runners work, we should common them up using command line options to select the "type" and CMake options to enable particular runner types (depending on the runtimes and hardware available).

Globally this PR add a SYCL runner, but it is very specific for Intel Level 0. It would be nice to have in the future some generalization, like SYCL using OpenCL interoperability interface to run the SPIR-V kernels or even native kernels.

Agreed! The SYCL runtime here is just being used to abstract the LevelZero calls, but this work will be helpful when adding a full SYCL runner (actual language extensions and libraries) to other CPUs/GPUs later.

Agree. The key point of this PR is not to create yet another runner, but just enable the existing mlir-cpu-runner to runs llvm + device kernel on SYCL platform. The current implementation uses L0 but could be extended.

I won't call this as adding a "SYCL runner", it is add SYCL backend to mlir-cpu-runner. The term "xxx runner" in MLIR is a bit overused. mlir-cpu-runner doesn't mean it runs only on CPU, instead it runs both the host and device kernel. mlir-cpu-runner currently does that but the name is a bit misleading. We really just need a "mlir-runner" with target-platform as command parameters.

}

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wglobal-constructors"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have this anywhere in the codebase right now, can you remove it and use lazy initialization instead?

@joker-eph
Copy link
Collaborator

I won't call this as adding a "SYCL runner", it is add SYCL backend to mlir-cpu-runner. The term "xxx runner" in MLIR is a bit overused. mlir-cpu-runner doesn't mean it runs only on CPU, instead it runs both the host and device kernel. mlir-cpu-runner currently does that but the name is a bit misleading.

This naming stems originally that there was a mlir-cuda-runner which was actually implementing some “complex logic” that goes beyond just translating to LLVM and setting up the environment: they take an input in higher-level dialects (func+arith+gpu+…) and perform all the lowering.
We killed it to merge it into mlir-cpu-runner: ⚙ D98396 [mlir] Remove mlir-cuda-runner
This means that all the "target specific" transformation are done with mlir-opt (or other tools) and mlir-cpu-runner is a "host" entry point without any complex logic: it won't do other transformation on the IR and just setup the JIT, load some runtime, and invoke the code through the LLVM JIT infra.
(this PR follows this model, and it's great!)

Vulkan and Spirv still have dedicated runners on the model of the original cuda-runner, but I suspect this is just legacy?

We really just need a "mlir-runner" with target-platform as command parameters.

What kind of "target-platform" command parameters do you have in mind? (other than what we do now)

@Jianhui-Li
Copy link

We really just need a "mlir-runner" with target-platform as command parameters.

What kind of "target-platform" command parameters do you have in mind? (other than what we do now)

The current way of mlir-cpu-runner using the share library name to indicate target-platform looks good to me: Cuda, Rocm, and SYCL with this PR. Vulkan could be added same way. mlir-cpu-spirv-runner could be refactored to be mlir-opt passes generating spirv binary and feed to mlir-cpu-runner.

If we reach that state, the name "mlir-cpu-runner" could be promoted to "mlir-runner". That would clear up a lot of misunderstanding. The "mlir-runner" is really running MLIR programs on a target platform, and has little to do with enabling specific language or runtime features of a target platform. For example, we say "SYCL runner", people think it is about running SYCL program on top of MLIR.

fabianmcg added a commit that referenced this pull request Sep 9, 2023
…oduleToBinary (#65797)

This patch adds the option of building an optional symbol table for the
top operation in the `gpu-module-to-binary` pass. The table is not
created by default as most targets don't need it; instead, it is lazily
built. The table is passed through a callback in `TargetOptions`.

This patch is required to integrate #65539 .
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…oduleToBinary (llvm#65797)

This patch adds the option of building an optional symbol table for the
top operation in the `gpu-module-to-binary` pass. The table is not
created by default as most targets don't need it; instead, it is lazily
built. The table is passed through a callback in `TargetOptions`.

This patch is required to integrate llvm#65539 .
joker-eph pushed a commit that referenced this pull request Sep 26, 2023
…#66154)

This PR is a breakdown of the big PR #65539 which enables intel gpu
integration. In this PR we pass count of parameters and size of gpu
binary to runtime wrappers since the SyclRuntimeWrappers (which will
come in subsequent PR) requires the spirv size for compilation and also
the number of parameters to iterate over the params.
joker-eph pushed a commit that referenced this pull request Sep 26, 2023
This PR is a breakdown of the big PR
#65539 which enables intel gpu
integration. In this PR we pass hostShared flag to runtime wrappers
(required by SyclRuntimeWrappers which will come in subsequent PR) to
indicate if the allocation is done on host shared gpu memory or device
only memory.
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this pull request Sep 29, 2023
…llvm#66154)

This PR is a breakdown of the big PR llvm#65539 which enables intel gpu
integration. In this PR we pass count of parameters and size of gpu
binary to runtime wrappers since the SyclRuntimeWrappers (which will
come in subsequent PR) requires the spirv size for compilation and also
the number of parameters to iterate over the params.
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this pull request Sep 29, 2023
…#66401)

This PR is a breakdown of the big PR
llvm#65539 which enables intel gpu
integration. In this PR we pass hostShared flag to runtime wrappers
(required by SyclRuntimeWrappers which will come in subsequent PR) to
indicate if the allocation is done on host shared gpu memory or device
only memory.
@antiagainst
Copy link
Member

Thanks for breaking down this pull request into various smaller pieces to make it easier for review. I looked at various pieces; LGTM. Looking forward to see this being supported! :)

from @joker-eph:

Vulkan and Spirv still have dedicated runners on the model of the original cuda-runner, but I suspect this is just legacy?

from @Jianhui-Li:

The current way of mlir-cpu-runner using the share library name to indicate target-platform looks good to me: Cuda, Rocm, and SYCL with this PR. Vulkan could be added same way. mlir-cpu-spirv-runner could be refactored to be mlir-opt passes generating spirv binary and feed to mlir-cpu-runner.

Yup; it's legacy. +1 to the idea of unifying! I've created #73457 to track this issue to make it more visible. I might not have the bandwidth to work on this; if somebody else is interested that'd be nice! So maked it as "good first issue" (not sure whether I'm pushing the limit of "good first issue" but hoping to get traction there).

@silee2
Copy link
Contributor Author

silee2 commented Dec 15, 2023

Closing as all sub-components has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir:gpu mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.