-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[mlir] [tensor] Crash in getPackOpResultTypeShape for tensor.pack/unpack ops. #90641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-tensor Author: Sayan Saha (sahas3) ChangesWindows build of
is leading to a crash when calling canonicalization on
The crash is seemingly coming from invalid memory access during iterating over This crash is also causing the following tests to fail:
Full diff: https://github.com/llvm/llvm-project/pull/90641.diff 1 Files Affected:
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 5029ed4aa0387a..425ebbe7f56017 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -3880,7 +3880,7 @@ static SmallVector<int64_t> getPackOpResultTypeShape(
ArrayRef<int64_t> sourceShape, ArrayRef<int64_t> innerTileSizes,
ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm) {
SmallVector<int64_t> resultShape = llvm::to_vector(sourceShape);
- for (auto tiledDim : llvm::enumerate(innerDimsPos)) {
+ for (auto tiledDim : llvm::enumerate(llvm::to_vector(innerDimsPos))) {
if (ShapedType::isDynamic(resultShape[tiledDim.value()]))
continue;
if (ShapedType::isDynamic(innerTileSizes[tiledDim.index()])) {
@@ -3909,7 +3909,7 @@ SmallVector<OpFoldResult> PackOp::getResultShape(
AffineExpr s0, s1;
bindSymbols(builder.getContext(), s0, s1);
AffineExpr ceilDivExpr = s0.ceilDiv(s1);
- for (auto tiledDim : llvm::enumerate(innerDimsPos)) {
+ for (auto tiledDim : llvm::enumerate(llvm::to_vector(innerDimsPos))) {
resultDims[tiledDim.value()] = affine::makeComposedFoldedAffineApply(
builder, loc, ceilDivExpr,
{resultDims[tiledDim.value()], innerTileSizes[tiledDim.index()]});
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. The issue is at the call site though. The ArrayRef
indicates a "reference argument" so this seems to be an error in how the function is called. Do you have more information on that?
Thanks for reviewing. TBH, this "fix" is a puzzle for me too since The callgraph leading to the crash is: What I've tried unsuccessfully:
|
Thanks for the details.. I tracked the stack as you described, and I dont find any obvious footguns. I see in some places the conversion to Maybe provide a repro for this? cc @jpienaar or @joker-eph on this. Hopefully its not some corner case w.r.t properties. |
Thanks, the repro is in the first comment in this MR. Copying it here again for clarity:
Note the repo needs to be build on Windows with the specific settings as mentioned above to repro the crash. |
I am just waiting for either @joker-eph or @jpienaar to chime in. Overall I am OK with this. Maybe @ftynse has some insight here? |
@joker-eph / @jpienaar / @ftynse do you have any thoughts on this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a legit fix. I am fine by it, but it is also hiding a problem
Yeah, nothing that is immediately visible to me either. Two default hypotheses are (1) the referenced object got moved after the reference was taken, (2) something messes up the lifetime extension rules so the underlying storage is rewritten. I'd check if getting values directly from |
Another random thing could is to use |
Windows build of
mlir
with Visual Studio (19.36.32538 for x64) using with the following command:cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm
is leading to a crash when calling canonicalization on
tensor.pack
/tensor.unpack
opsmlir-opt --canonicalize input.mlir
where theinput.mlir
is as follows (this is taken from one of the filecheck tests fortensor.pack
):The crash is seemingly coming from invalid memory access during iterating over
innerDimsPos
withingetPackOpResultTypeShape
.This crash is also causing the following tests to fail: