Skip to content

[mlir] [tensor] Crash in getPackOpResultTypeShape for tensor.pack/unpack ops. #90641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2024

Conversation

sahas3
Copy link
Member

@sahas3 sahas3 commented Apr 30, 2024

Windows build of mlir with Visual Studio (19.36.32538 for x64) using with the following command:

cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm

is leading to a crash when calling canonicalization on tensor.pack/tensor.unpack ops mlir-opt --canonicalize input.mlir where the input.mlir is as follows (this is taken from one of the filecheck tests for tensor.pack):

func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> {
          %pack_dest = tensor.empty() : tensor<8x16x8x32xf32>
          %unpack_dest = tensor.empty() : tensor<128x256xf32>
          %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32>
          %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
          return %tup : tensor<128x256xf32>
        }

The crash is seemingly coming from invalid memory access during iterating over innerDimsPos within getPackOpResultTypeShape.

This crash is also causing the following tests to fail:

MLIR :: Dialect/Linalg/canonicalize.mlir                                                                                                                    
MLIR :: Dialect/Linalg/data-layout-propagation.mlir                                                                                                         
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir                                                                                                     
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir                                                                                                          
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir                                                                                                   
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir                                                                                                        
MLIR :: Dialect/Linalg/transform-lower-pack.mlir                                                                                                            
MLIR :: Dialect/Linalg/transform-op-fuse.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-op-pack.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir                                                                                                         
MLIR :: Dialect/Tensor/canonicalize.mlir                                                                                                                   
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir                                                                                                       
MLIR :: Dialect/Tensor/invalid.mlir                                                                                                                         
MLIR :: Dialect/Tensor/ops.mlir                                                                                                                             
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir                                                                                                           
MLIR :: Dialect/Tensor/tiling.mlir

@sahas3 sahas3 marked this pull request as ready for review April 30, 2024 18:57
@llvmbot
Copy link
Member

llvmbot commented Apr 30, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-tensor

Author: Sayan Saha (sahas3)

Changes

Windows build of mlir with Visual Studio (19.36.32538 for x64) using with the following command:

cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host llvm

is leading to a crash when calling canonicalization on tensor.pack/tensor.unpack ops mlir-opt --canonicalize input.mlir where the input.mlir is as follows (this is taken from one of the filecheck tests for tensor.pack):

func.func @<!-- -->pack_unpack(%arg0: tensor&lt;128x256xf32&gt;) -&gt; tensor&lt;128x256xf32&gt; {
          %pack_dest = tensor.empty() : tensor&lt;8x16x8x32xf32&gt;
          %unpack_dest = tensor.empty() : tensor&lt;128x256xf32&gt;
          %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor&lt;128x256xf32&gt; -&gt; tensor&lt;8x16x8x32xf32&gt;
          %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor&lt;8x16x8x32xf32&gt; -&gt; tensor&lt;128x256xf32&gt;
          return %tup : tensor&lt;128x256xf32&gt;
        }

The crash is seemingly coming from invalid memory access during iterating over innerDimsPos within getPackOpResultTypeShape.

This crash is also causing the following tests to fail:

MLIR :: Dialect/Linalg/canonicalize.mlir                                                                                                                    
MLIR :: Dialect/Linalg/data-layout-propagation.mlir                                                                                                         
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir                                                                                                     
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir                                                                                                          
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir                                                                                                   
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir                                                                                                        
MLIR :: Dialect/Linalg/transform-lower-pack.mlir                                                                                                            
MLIR :: Dialect/Linalg/transform-op-fuse.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-op-pack.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir                                                                                                         
MLIR :: Dialect/Tensor/canonicalize.mlir                                                                                                                   
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir                                                                                                       
MLIR :: Dialect/Tensor/invalid.mlir                                                                                                                         
MLIR :: Dialect/Tensor/ops.mlir                                                                                                                             
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir                                                                                                           
MLIR :: Dialect/Tensor/tiling.mlir

Full diff: https://github.com/llvm/llvm-project/pull/90641.diff

1 Files Affected:

  • (modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+2-2)
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 5029ed4aa0387a..425ebbe7f56017 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -3880,7 +3880,7 @@ static SmallVector<int64_t> getPackOpResultTypeShape(
     ArrayRef<int64_t> sourceShape, ArrayRef<int64_t> innerTileSizes,
     ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm) {
   SmallVector<int64_t> resultShape = llvm::to_vector(sourceShape);
-  for (auto tiledDim : llvm::enumerate(innerDimsPos)) {
+  for (auto tiledDim : llvm::enumerate(llvm::to_vector(innerDimsPos))) {
     if (ShapedType::isDynamic(resultShape[tiledDim.value()]))
       continue;
     if (ShapedType::isDynamic(innerTileSizes[tiledDim.index()])) {
@@ -3909,7 +3909,7 @@ SmallVector<OpFoldResult> PackOp::getResultShape(
   AffineExpr s0, s1;
   bindSymbols(builder.getContext(), s0, s1);
   AffineExpr ceilDivExpr = s0.ceilDiv(s1);
-  for (auto tiledDim : llvm::enumerate(innerDimsPos)) {
+  for (auto tiledDim : llvm::enumerate(llvm::to_vector(innerDimsPos))) {
     resultDims[tiledDim.value()] = affine::makeComposedFoldedAffineApply(
         builder, loc, ceilDivExpr,
         {resultDims[tiledDim.value()], innerTileSizes[tiledDim.index()]});

@sahas3 sahas3 changed the title [BugFix] : Crash in getPackOpResultTypeShape for tensor.pack/unpack ops. [mlir] [tensor] Crash in getPackOpResultTypeShape for tensor.pack/unpack ops. Apr 30, 2024
Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. The issue is at the call site though. The ArrayRef indicates a "reference argument" so this seems to be an error in how the function is called. Do you have more information on that?

@sahas3
Copy link
Member Author

sahas3 commented May 2, 2024

Thanks for the fix. The issue is at the call site though. The ArrayRef indicates a "reference argument" so this seems to be an error in how the function is called. Do you have more information on that?

Thanks for reviewing.

TBH, this "fix" is a puzzle for me too since ArrayRef doesn't own the data. Debugging doesn't provide any information since the issue is only reproducible in Release build. RelWithDebInfo doesn't produce the crash either.

The callgraph leading to the crash is: PackOp::verify -> commonVerifierPackAndUnPackOp -> inferPackedType -> getPackOpResultTypeShape. All the call-sites starting from inferPackedType pass innerDimsPos as ArrayRef as the getter itself returns it as ArrayRef. This seems to be because innerDimsPos is a DenseI64ArrayAttr which gets cast as ArrayRef when returned in the getters.

What I've tried unsuccessfully:

  1. Convert the returned value from getInnerDimsPos to a SmallVector in commonVerifierPackAndUnPackOp and use that for all the subsequent calls.
  2. Call inferPackedType as the first function from commonVerifierPackAndUnPackOp (just to ensure that the memory for innerDimsPos doesn't get affected by any other call -- though nothing in that code seemed to be doing that)

@MaheshRavishankar
Copy link
Contributor

Thanks for the fix. The issue is at the call site though. The ArrayRef indicates a "reference argument" so this seems to be an error in how the function is called. Do you have more information on that?

Thanks for reviewing.

TBH, this "fix" is a puzzle for me too since ArrayRef doesn't own the data. Debugging doesn't provide any information since the issue is only reproducible in Release build. RelWithDebInfo doesn't produce the crash either.

The callgraph leading to the crash is: PackOp::verify -> commonVerifierPackAndUnPackOp -> inferPackedType -> getPackOpResultTypeShape. All the call-sites starting from inferPackedType pass innerDimsPos as ArrayRef as the getter itself returns it as ArrayRef. This seems to be because innerDimsPos is a DenseI64ArrayAttr which gets cast as ArrayRef when returned in the getters.

What I've tried unsuccessfully:

  1. Convert the returned value from getInnerDimsPos to a SmallVector in commonVerifierPackAndUnPackOp and use that for all the subsequent calls.
  2. Call inferPackedType as the first function from commonVerifierPackAndUnPackOp (just to ensure that the memory for innerDimsPos doesn't get affected by any other call -- though nothing in that code seemed to be doing that)

Thanks for the details.. I tracked the stack as you described, and I dont find any obvious footguns. I see in some places the conversion to SmallVector of the result of getInnerDimsPos (https://github.com/shark-infra/llvm-project/blob/b3291793f11924a3b62601aabebebdcfbb12a9a1/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L3734) . My other suspicion is that depending on how you are using it, the pack operations itself might be deleted, and these methods are being called on a deleted op.

Maybe provide a repro for this? cc @jpienaar or @joker-eph on this. Hopefully its not some corner case w.r.t properties.

@sahas3
Copy link
Member Author

sahas3 commented May 3, 2024

Thanks, the repro is in the first comment in this MR. Copying it here again for clarity:

Windows build of mlir with Visual Studio (19.36.32538 for x64) using with the following command:

cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1 -DLLVM_TARGETS_TO_BUILD=host ../llvm

is leading to a crash when calling canonicalization on tensor.pack/tensor.unpack ops mlir-opt --canonicalize input.mlir where the input.mlir is as follows (this is taken from one of the filecheck tests for tensor.pack):

func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> {
          %pack_dest = tensor.empty() : tensor<8x16x8x32xf32>
          %unpack_dest = tensor.empty() : tensor<128x256xf32>
          %tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32>
          %tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
          return %tup : tensor<128x256xf32>
        }

The crash is seemingly coming from invalid memory access during iterating over innerDimsPos within getPackOpResultTypeShape.

This crash is also causing the following tests to fail:

MLIR :: Dialect/Linalg/canonicalize.mlir                                                                                                                    
MLIR :: Dialect/Linalg/data-layout-propagation.mlir                                                                                                         
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir                                                                                                     
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir                                                                                                          
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir                                                                                                   
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir                                                                                                        
MLIR :: Dialect/Linalg/transform-lower-pack.mlir                                                                                                            
MLIR :: Dialect/Linalg/transform-op-fuse.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-op-pack.mlir                                                                                                               
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir                                                                                                         
MLIR :: Dialect/Tensor/canonicalize.mlir                                                                                                                   
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir                                                                                                       
MLIR :: Dialect/Tensor/invalid.mlir                                                                                                                         
MLIR :: Dialect/Tensor/ops.mlir                                                                                                                             
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir                                                                                                           
MLIR :: Dialect/Tensor/tiling.mlir

Note the repo needs to be build on Windows with the specific settings as mentioned above to repro the crash.

@MaheshRavishankar
Copy link
Contributor

I am just waiting for either @joker-eph or @jpienaar to chime in. Overall I am OK with this. Maybe @ftynse has some insight here?

@sahas3
Copy link
Member Author

sahas3 commented May 9, 2024

I am just waiting for either @joker-eph or @jpienaar to chime in. Overall I am OK with this. Maybe @ftynse has some insight here?

@joker-eph / @jpienaar / @ftynse do you have any thoughts on this PR? Thanks!

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a legit fix. I am fine by it, but it is also hiding a problem

@sabauma sabauma merged commit c5e67b8 into llvm:main May 13, 2024
9 checks passed
@ftynse
Copy link
Member

ftynse commented May 17, 2024

Yeah, nothing that is immediately visible to me either. Two default hypotheses are (1) the referenced object got moved after the reference was taken, (2) something messes up the lifetime extension rules so the underlying storage is rewritten. I'd check if getting values directly from packOrUnpack instead of using cached arrayrefs here RankedTensorType expectedPackedType = PackOp::inferPackedType( unpackedType, packOrUnPack.getStaticTiles(), innerDimsPos, outerDimPerm); helps.

@ftynse
Copy link
Member

ftynse commented May 17, 2024

Another random thing could is to use auto &&[index, value] : llvm::enumerate() to preserve the info about references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants