[MLIR] Modify lowering of gpu.alloc op to llvm #69969

nbpatel · 2023-10-23T20:51:08Z

If gpu.alloc has no asyn deependency ( in case if gpu.alloc has hostShared allocation), create a new stream & synchronize. This PR is follow up to #66401

nbpatel · 2023-10-23T20:51:31Z

@grypp tagging for review

llvmbot · 2023-10-23T20:52:17Z

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Nishant Patel (nbpatel)

Changes

If gpu.alloc has no asyn deependency ( this is true if gpu.alloc has hostShared allocation), create a new stream. This PR is follow up to #66401

Full diff: https://github.com/llvm/llvm-project/pull/69969.diff

1 Files Affected:

(modified) mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp (+16-2)

diff --git a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
index 097caf23edfa5dd..da1c468ed1dfd71 100644
--- a/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
+++ b/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
@@ -836,7 +836,11 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   // Allocate the underlying buffer and store a pointer to it in the MemRef
   // descriptor.
   Type elementPtrType = this->getElementPtrType(memRefType);
-  auto stream = adaptor.getAsyncDependencies().front();
+
+  Value stream =
+      adaptor.getAsyncDependencies().empty()
+          ? streamCreateCallBuilder.create(loc, rewriter, {}).getResult()
+          : adaptor.getAsyncDependencies().front();
 
   auto isHostShared = rewriter.create<mlir::LLVM::ConstantOp>(
       loc, llvmInt8Type, rewriter.getI8IntegerAttr(isShared));
@@ -855,7 +859,17 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   auto memRefDescriptor = this->createMemRefDescriptor(
       loc, memRefType, allocatedPtr, alignedPtr, shape, strides, rewriter);
 
-  rewriter.replaceOp(allocOp, {memRefDescriptor, stream});
+  if (allocOp.getAsyncToken()) {
+    // Async alloc: make dependent ops use the same stream.
+    rewriter.replaceOp(allocOp, {memRefDescriptor, stream});
+  } else {
+    // Synchronize with host and destroy stream. This must be the stream created
+    // above (with no other uses) because we check that the synchronous version
+    // does not have any async dependencies.
+    streamSynchronizeCallBuilder.create(loc, rewriter, {stream});
+    streamDestroyCallBuilder.create(loc, rewriter, {stream});
+    rewriter.replaceOp(allocOp, {memRefDescriptor});
+  }
 
   return success();
 }

grypp

I'm concerned about this PR. It creates and syncs a stream, but it's never used in our runtimes, including the new sycl runtime.

We should avoid creating and syncing a stream when it's not necessary, as it has overhead.

grypp · 2023-10-24T13:29:23Z

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

-  auto stream = adaptor.getAsyncDependencies().front();
+
+  Value stream =
+      adaptor.getAsyncDependencies().empty()


I'm a bit unclear of creating stream. If the stream isn't available, my understanding is that we would prefer synchronous execution. In such a scenario, we wouldn't need the stream. At least in CUDA model, you don't have to do that, the cuda driver/runtime can use existing default stream implicitly.

Can we pass nullptr for stream when it's not available? Alternatively, we might create another API in the runtimes.

Afaik, creating and synchronizing stream is not free. We better avoid if it is not necessary.

For example, sycl runtime (below) that doesn't utilize the 'stream' (aka sycl::queue *queue).

static void *allocDeviceMemory(sycl::queue *queue, size_t size, bool isShared) { void *memPtr = nullptr; if (isShared) { memPtr = sycl::aligned_alloc_shared(64, size, getDefaultDevice(), getDefaultContext()); } else { memPtr = sycl::aligned_alloc_device(64, size, getDefaultDevice(), getDefaultContext()); } if (memPtr == nullptr) { throw std::runtime_error("mem allocation failed!"); } return memPtr; } mgpuMemAlloc(uint64_t size, sycl::queue *queue, bool isShared) { return catchAll([&]() { return allocDeviceMemory(queue, static_cast<size_t>(size), true); }); }

ok I agree. I made the change. I added the stream in the first place is because all the runtime signatures for mgpuMemAlloc expects a stream (albeit it doesn't use it now) ...but in future if they do require the stream passed for allocation ...but I agree for now we can pass nullptr

ok I agree. I made the change.

Thanks for doing that! It looks better now. We just need a test for this PR as it’s regular llvm code submission policy.

I added the stream in the first place is because all the runtime signatures for mgpuMemAlloc expects a stream (albeit it doesn't use it now) ...but in future if they do require the stream passed for allocation ...but I agree for now we can pass nullptr

Would you like to separate runtime APIs into mgpuMemAlloc and mgpuMemAllocAsync? The first one doesn't need a stream in the function declaration, while the second one works asynchronously and requires a stream. This is an additional work if you are interested.

Added the test. Yes I am interested to do it but in a separate PR once all the PR's for #65539 are merged

grypp

Thanks looks good

nbpatel · 2023-10-25T17:09:06Z

@grypp can you help me merge this?

If gpu.alloc has no asyn deependency ( in case if gpu.alloc has hostShared allocation), create a new stream & synchronize. This PR is follow up to llvm#66401

[MLIR] Modify lowering of gpu.alloc op to llvm

6665a45

llvmbot added mlir:gpu mlir labels Oct 23, 2023

grypp reviewed Oct 24, 2023

View reviewed changes

Pass nullptr for stream for sync execution

18dce07

nbpatel requested a review from grypp October 24, 2023 18:17

Add a test case

e0c8bd7

grypp approved these changes Oct 25, 2023

View reviewed changes

grypp merged commit ced9f4f into llvm:main Oct 25, 2023

nbpatel deleted the nishant_changeLowering branch April 16, 2025 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR] Modify lowering of gpu.alloc op to llvm #69969

[MLIR] Modify lowering of gpu.alloc op to llvm #69969

nbpatel commented Oct 23, 2023 •

edited

Loading

nbpatel commented Oct 23, 2023

llvmbot commented Oct 23, 2023 •

edited

Loading

grypp left a comment

grypp Oct 24, 2023

nbpatel Oct 24, 2023

grypp Oct 24, 2023

nbpatel Oct 24, 2023

grypp left a comment

nbpatel commented Oct 25, 2023

[MLIR] Modify lowering of gpu.alloc op to llvm #69969

[MLIR] Modify lowering of gpu.alloc op to llvm #69969

Conversation

nbpatel commented Oct 23, 2023 • edited Loading

nbpatel commented Oct 23, 2023

llvmbot commented Oct 23, 2023 • edited Loading

grypp left a comment

Choose a reason for hiding this comment

grypp Oct 24, 2023

Choose a reason for hiding this comment

nbpatel Oct 24, 2023

Choose a reason for hiding this comment

grypp Oct 24, 2023

Choose a reason for hiding this comment

nbpatel Oct 24, 2023

Choose a reason for hiding this comment

grypp left a comment

Choose a reason for hiding this comment

nbpatel commented Oct 25, 2023

nbpatel commented Oct 23, 2023 •

edited

Loading

llvmbot commented Oct 23, 2023 •

edited

Loading