-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MLIR] Pass hostShared flag in gpu.alloc op to runtime wrappers #66401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
4a32117
4752f1b
49043d9
44f3978
4412bf5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -167,7 +167,8 @@ class ConvertOpToGpuRuntimeCallPattern : public ConvertOpToLLVMPattern<OpTy> { | |
"mgpuMemAlloc", | ||
llvmPointerType /* void * */, | ||
{llvmIntPtrType /* intptr_t sizeBytes */, | ||
llvmPointerType /* void *stream */}}; | ||
llvmPointerType /* void *stream */, | ||
llvmInt64Type /* bool isHostShared */}}; | ||
FunctionCallBuilder deallocCallBuilder = { | ||
"mgpuMemFree", | ||
llvmVoidType, | ||
|
@@ -786,19 +787,23 @@ LogicalResult ConvertHostUnregisterOpToGpuRuntimeCallPattern::matchAndRewrite( | |
LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite( | ||
gpu::AllocOp allocOp, OpAdaptor adaptor, | ||
ConversionPatternRewriter &rewriter) const { | ||
if (adaptor.getHostShared()) | ||
return rewriter.notifyMatchFailure( | ||
allocOp, "host_shared allocation is not supported"); | ||
|
||
MemRefType memRefType = allocOp.getType(); | ||
|
||
if (failed(areAllLLVMTypes(allocOp, adaptor.getOperands(), rewriter)) || | ||
!isConvertibleAndHasIdentityMaps(memRefType) || | ||
failed(isAsyncWithOneDependency(rewriter, allocOp))) | ||
!isConvertibleAndHasIdentityMaps(memRefType)) | ||
return failure(); | ||
|
||
auto loc = allocOp.getLoc(); | ||
|
||
bool isShared = allocOp.getHostShared(); | ||
|
||
if (isShared && allocOp.getAsyncToken()) | ||
return rewriter.notifyMatchFailure( | ||
allocOp, "Host Shared allocation cannot be done async"); | ||
else if (!isShared && failed(isAsyncWithOneDependency(rewriter, allocOp))) | ||
return failure(); | ||
|
||
// Get shape of the memref as values: static sizes are constant | ||
// values and dynamic sizes are passed to 'alloc' as operands. | ||
SmallVector<Value, 4> shape; | ||
|
@@ -811,8 +816,13 @@ LogicalResult ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite( | |
// descriptor. | ||
Type elementPtrType = this->getElementPtrType(memRefType); | ||
auto stream = adaptor.getAsyncDependencies().front(); | ||
|
||
auto isHostShared = rewriter.create<mlir::LLVM::ConstantOp>( | ||
loc, llvmInt64Type, rewriter.getI64IntegerAttr(isShared)); | ||
|
||
Value allocatedPtr = | ||
allocCallBuilder.create(loc, rewriter, {sizeBytes, stream}).getResult(); | ||
allocCallBuilder.create(loc, rewriter, {sizeBytes, stream, isHostShared}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we had a consensus about avoiding the use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need to relax the checks in that case in GPUToLLVMConversion Pass to allow lowering of non async gpu.alloc......also we might need to change the gpu-async-region pass to handle this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I think it's worth to do it! What's your take on this? Personally, I think having a complete PR is the way to go. Otherwise, we will have an improperly implemented Op. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I relaxed the checks for the GPUToLLVMCoversion pass. Regarding touching other passes, can we do it in an iterative PR once all the PR's relating to #65539 are merged? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Awesome thanks! Sure, it sounds good me to. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. any other feedback on the PR @grypp ? or else can we merge it if it looks good to you |
||
.getResult(); | ||
if (!getTypeConverter()->useOpaquePointers()) | ||
allocatedPtr = | ||
rewriter.create<LLVM::BitcastOp>(loc, elementPtrType, allocatedPtr); | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be
i8
to match C++ bool, or, better, just use i32 (int) on both sides.