[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

qedawkins · 2024-11-11T21:43:37Z

No description provided.

llvmbot · 2024-11-11T21:44:10Z

@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Quinn Dawkins (qedawkins)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/115772.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+1)
(modified) mlir/test/Dialect/Tensor/canonicalize.mlir (+8-2)

diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 8e0d0104397468..147120e0e34203 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
     PackOp newOp = rewriter.create<PackOp>(
         op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
         newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
+    newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
 
     // Replace op.
     Value oldResult = op.getResult();
diff --git a/mlir/test/Dialect/Tensor/canonicalize.mlir b/mlir/test/Dialect/Tensor/canonicalize.mlir
index 2c826d7ae008d5..0b54c207dea84e 100644
--- a/mlir/test/Dialect/Tensor/canonicalize.mlir
+++ b/mlir/test/Dialect/Tensor/canonicalize.mlir
@@ -2750,7 +2750,10 @@ func.func @fold_cast_multiple_results(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2x
 // CHECK-SAME:      %[[DEST:.*]]: tensor<1x1x8x1xi32>,
 // CHECK-SAME:      %[[SRC:.*]]: tensor<7x?xi32>,
 // CHECK-SAME:      %[[PAD:.*]]: i32) -> tensor<1x1x8x1xi32> {
-// CHECK:           %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32) inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]] : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
+// CHECK:           %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32)
+// CHECK-SAME:        inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]]
+// CHECK-SAME:        some_attr
+// CHECK-SAME:        : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
 // CHECK:           return %[[PACK]] : tensor<1x1x8x1xi32>
 func.func @fold_cast_pack_dynamic_tile_size(
   %dest: tensor<1x1x8x1xi32>,
@@ -2759,7 +2762,10 @@ func.func @fold_cast_pack_dynamic_tile_size(
 
     %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
     %c8 = arith.constant 8 : index
-    %pack = tensor.pack %src padding_value(%pad : i32) inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
+    %pack = tensor.pack %src padding_value(%pad : i32)
+      inner_dims_pos = [0, 1]
+      inner_tiles = [%c8, 1]
+      into %cast {some_attr} : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
     %res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
     return %res : tensor<1x1x8x1xi32>
 }

joker-eph · 2024-11-11T21:52:47Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
    PackOp newOp = rewriter.create<PackOp>(
        op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
        newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
+    newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());


Why is this sound? How do you guarantee that the pattern can't invalidate anything encoded in there?

We don't have a principled approach to this in general unfortunately, here the pattern replace the op so it was visible, other folders operate in-place and are propagating "by accident".

By the way: can't this pattern just operate in-place here instead?

I'm not sure how easy it would be to make it operate in place because it is potentially updating a list of mixed operands (newMixedTileSizes) meaning operand segment lengths need to be updated as well and I've had a bad experience trying to do that in the past.

Why is this sounds? How do you guarantee that the pattern can't invalidate anything encoded in there?

There is no guarantee this is sound in the same way any pattern that operates on generic tensor encodings isn't sound. As I see it there are hardly any "globally sound" decisions we can make about arbitrary unstructured attributes so instead it seems as though people have been going for a practical middle ground. This is also consistent with every other cast folder I've seen (not saying there aren't counter examples).

The reason these are called "discardable" is because we are expected to discard them when we don't understand them / can't update them in transformations!

So the principled approach would be to be conservative: either don't perform the transformation if you don't understand the attributes, or drop them.

I dont know if it can be inplace. The result type is changing I think.

I think the working assumption is that preserving these attributes are, at best, best effort. You cant always transfer over attributes, so its safe to discard. But in cases like this when the op is being replaced by an op of the same type, its also safe to carry it over. If someone relies on this attribute thats "at your risk", but best effort to preserve the attribute. Its not the most principled thing, but more a pragmatic thing.

There are many things that are, let's be positive and call them "quirky", about this dialect. If this were its own project, the approach to attributes for things at this layer would be top of the list of things to document. The very next thing I would do is likely remove most of the canonicalizers and sort them into things that could be documented and used in a targeted way. It isn't even clear to me that the classic definitions even apply to things at this layer. Lots of work, though, and in the absence of it, consistency and pragmatism is a higher valued commodity.

I don't believe there are satisfying answers in the current state, and we're not going to keep discussing misalignments of the core principles with realities of these dialects.

We value consistency with the rest of the dialect over a solution that no one seems able to articulate.

Thank you for your feedback.

Sorry, but you can't just discard my comments like this. Please consider these as blocking discussions right now.

I'm sorry, but I don't see how you have the standing on this project to say that. Your feedback and insights are of course welcome, but the people with the miles in on these dialects need to arbitrate the result.

I don't know what qualifies you to make this call right now, I don't recognize you this authority, and nothing in our practices or documented policies on code reviews support your behavior here.

The consistency of MLIR is important, and as long as this code is under mlir/...., I consider qualified to make these comments and totally expect to be taken seriously. I also consider myself fully qualify to revert any patch which lands in these conditions.

Your comments are of course welcome, but you don't have any more standing for MLIR as a whole than I do. In those situations, when there are clearly active contributors to an area processing feedback, it is in their prerogative to make decisions. In this case I happen to know that, @rengolin has a plan for these ops that will be a wholly better situation, and @MaheshRavishankar and @qedawkins making pragmatic choices that smooth paths while things unfold is what we expect of maintainers. Which they are, by the same conventions that always arbitrate in the absence of established governance: the people who have the miles in for an area need to be able to move things forward and make decisions.

bondhugula · 2024-11-13T04:58:13Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
    PackOp newOp = rewriter.create<PackOp>(
        op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
        newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
+    newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());


I don't see other folding patterns in this dialect propagating external attributes. A change like this should have a comment/rationale at the least to start with.

The rationale stated claimed in the comments was that in the prior state, the folders were operating in place, which implicitly preserves these attributes. Regardless of whether this is the best kind of assumption to have, it is an inconsistent and unintended state to have this be different just because this pattern does not do an in place modification.

Agreed, that should have made it to the commit message.

But there are numerous patterns in this file (folding or otherwise) that aren't doing in-place modifications (including tensor.unpack) -- they aren't propagating external attributes either.

$ grep replaceOpWithNewOp lib/Dialect/Tensor/IR/TensorOps.cpp | wc -l 24 # (not even counting create<...> + replaceOp)

4682 Value newOp = rewriter.create<tensor::UnPackOp>( 4683 loc, source, dest, unPackOp.getInnerDimsPos(), unPackOp.getMixedTiles(), 4684 unPackOp.getOuterDimsPerm()); 4685 rewriter.replaceOpWithNewOp<tensor::CastOp>( 4686 unPackOp, unPackOp.getResult().getType(), newOp); 4687 return success();

I am confused as to what then is the point of external attributes or discardable attributes then.
I agree this isn't consistent. Like I said it's best effort.

I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.

Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.

bondhugula

I don't think commit summaries on such changes can be left blank. The title itself isn't sufficient.

banach-space · 2024-11-13T09:32:09Z

I see this as a follow-up to the recent change I made in #114559.

From what I can tell, before my change, the behavior introduced explicitly here was already present implicitly via the clone method, which preserved these attributes (*). So, is this PR effectively restoring the previous behaviour?

To me, this feels like a temporary solution rather than a scalable one. I agree with others that preserving discardable attributes seems somewhat ad hoc. My recent PR may have uncovered this issue, and I understand that this approach is a quick patch. Sometimes, this kind of fix is indeed what’s needed to unblock people while a sounder approach is discussed.

I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.

Sure, the Op transformation seems fine. But what about the attribute itself? Preserving it happens to be OK for the particular attribute in your use case (I don't know what that is), but may not make sense for other attributes that somebody else may want to attach.

(*) My understanding is that it clones "inherent" rather than "discardable" attributes, but the underlying storage seems to be shared between the two?

MaheshRavishankar · 2024-11-13T17:24:17Z

Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.

Pass boundaries are completely arbitrary. I can do whatever in a pass (including launching new pipelines). So defining the attributes as something that does not leak across passes is completely arbitrary. Even with that definition, if I had two separate pattern application steps and I set an attribute before the first step it would be dropped by the next step.

I think we are missing that it is really useful for a compiler stack to have some way of annotating operations. This is very closely related to LLVM metadata, which from my working experience has a lot of similar issues, but is a tool that is used. It is not necessarily the most principled thing but is a pragmatic way to annotate the IR with some information that can be used by subsequent optimizations. This is literally the use case here. In my book, the real issue here is that this transformation could have been done in-place, in which case there would not be an issue.

We dont need to dictate how the whole world builds compilers. A lot of time you need to make pragmatic choice to connect things, and as long as it isnt disruptive to all downstream users this should be acceptable. Literally no one is affected by this change.

joker-eph · 2024-11-13T18:29:30Z

This is literally the use case here. In my book, the real issue here is that this transformation could have been done in-place, in which case there would not be an issue.

I don't believe so: a correct and sound transformation in-place would also explicitly drop these attributes.

We dont need to dictate how the whole world builds compilers.

Indeed: you do whatever you want downstream, but we have a consistent project upstream where we need to adhere to some principled and consistent designs.

Literally no one is affected by this change.

This is not true: anyone which was annotating such an op would be affected because we would preserve incorrectly an attribute, potentially leading to a crash or worse: a silent miscompile.

joker-eph · 2024-11-13T18:38:31Z

Note that the project stance on discardable attributes was debated quite a few times, some of the pointers:

https://discourse.llvm.org/t/mlir-ensuring-namedattributes-propagate-through-pass-pipeline/63421/2
https://discourse.llvm.org/t/rfc-implicit-propagation-of-dialect-attributes-best-effort/2657/ (I invested non-trivial amount of effort trying to find a solution to help with this!)
And we had an ODM on the topic: https://discourse.llvm.org/t/odm-tomorrow-2-4-discuss-the-role-of-dialect-attributes-merging-and-propagation/2725
See also Chris' message here which still seems pretty relevant: https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/2

They were not initially called like this by the way, it's after one of these discussions that we felt the need to be explicit about the intent and picked the term "discardable": https://reviews.llvm.org/D96093

MaheshRavishankar · 2024-11-13T20:26:37Z

This is not true: anyone which was annotating such an op would be affected because we would preserve incorrectly an attribute, potentially leading to a crash or worse: a silent miscompile.

Here is the problem. We have two cases

There is an attribute that is added that gets dropped when it is being relied upon for a subsequent transformation/pass. If we drop it then it is hard to recover this.
A attribute was added that is supposed to be dropped and subsequent transformation/pass relies on it being dropped. I would think this would be easier to drop the attribute prescriptively than expecting it to happen automatically.

We cant serve both cases for sure. This is really a missing feature in MLIR. An ability for an optimization to leave information for subsequent optimizations to pick up. It is fragile but this is required in many cases. This is the reason why we use it (some information is more easy to "compute" earlier in the pipeline than at the point of transformation). This is also the same reason I have seen metadata being used in LLVM. There isnt support for this feature. If MLIR had this feature it would make everyones life much easier.

See also Chris' message here which still seems pretty relevant: https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/2

On the same thread https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/5?u=maheshravishankar . Like I said ties back to metadata. There is no good answer here. Best way IMO is to take it on a case-by-case basis and proceed based on use case. I dont totally disagree with the comments about the pitfalls here... but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong. This goes back to, there is a missing feature here, in the absence of which people reach for the only path that allows them to do this.

joker-eph · 2024-11-13T21:06:58Z

An ability for an optimization to leave information for subsequent optimizations to pick up.

To be clear: an optimization can do that. You can define some discardable attributes that the optimizations knows about and knows how to preserve through transformations, that's perfectly fine. But this isn't the same as "blind" propagation through transformations.

but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong

I don't think so: one is a miscompile / correctness problem, the other is a conservative / safe options. These are called discardable attributes for a reason, and the reason is so that the options are absolutely not "equally wrong".

MaheshRavishankar · 2024-11-14T00:38:50Z

but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong

I don't think so: one is a miscompile / correctness problem, the other is a conservative / safe options. These are called discardable attributes for a reason, and the reason is so that the options are absolutely not "equally wrong".

I am not sure I follow that... but maybe thats a tangent. On the same thread Chris says https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/10?u=maheshravishankar . Specifically

We can define a schema for attributes, so the attributes themselves carry information about “I am safe to implicitly propagate”. We would nail down what that exactly means, but if we had that, you could put this bit on the attributes in question and we could make the infra look for it and propagate “just these” attributes.

We seem to have stopped at discardable attributes and have not added "preserved attributes". The transformation does not need to worry about the semantics of these preserved attributes. In cases where it is effectively cloning the operation, you can clone the preserved attributes. If these attributes are such that the meaning changes, well that is a burden on downstream users to adapt accordingly.

joker-eph · 2024-11-14T10:15:56Z

We seem to have stopped at discardable attributes and have not added "preserved attributes". The transformation does not need to worry about the semantics of these preserved attributes.

Did you look at this link I posted above? https://discourse.llvm.org/t/rfc-implicit-propagation-of-dialect-attributes-best-effort/2657/
We never found a path forward for "preserved attributes", but we tried for some time.

If these attributes are such that the meaning changes, well that is a burden on downstream users to adapt accordingly.

This is not how the system is designed, they should be dropped by default, it's not downstream problem. What you describe is something that could have been done, it is just not what is there with discardable attributes.

…5772)

This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This change mirrors a similar update made for `tensor::PackOp` in llvm#114559. Below is the updated rationale for `tensor::UnPackOp`. Currently, `FoldTensorCastProducerOp` incorrectly folds the following: ```mlir %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32> %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` as: ```mlir %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` This leads to an Op verification failure because the folder does not update the inner tile sizes in the unpack Op. This patch resolves the issue. Additional Changes: * invalid.mlir: Fixes a typo. * TensorOps.cpp: Removes unnecessary `(void)tileSize` and adds extra comments following this discussion: llvm#115772.

This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a similar update made for `tensor::PackOp` in #114559. Below is the updated rationale tailored to `tensor::UnPackOp`. ISSUE DESCRIPTION Currently, `FoldTensorCastProducerOp` incorrectly folds the following: ```mlir %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32> // Note: `%c8` and `?`. %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` as: ```mlir // Note: `%c8` and `8`. %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32> ``` This triggers an Op verification failure because the folder does not update the inner tile sizes in the unpack Op. This patch addresses the issue by ensuring proper handling of inner tile sizes. ADDITIONAL CHANGES * invalid.mlir: Fixed a typo. * TensorOps.cpp: * Removed unnecessary `(void)tileSize`. * Added comments following the discussion in PR #115772. * Made minor updates to `FoldTensorCastPackOp` for consistency with the newly introduced `FoldTensorCastUnPackOp`. * Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g., replaced mixed use of `test_attr` and `some_attr`).

[mlir][Tensor] Retain discardable attrs back in cast(pack) folder

219ec7f

qedawkins requested a review from banach-space November 11, 2024 21:43

llvmbot added mlir mlir:tensor labels Nov 11, 2024

joker-eph reviewed Nov 11, 2024

View reviewed changes

qedawkins changed the title ~~[mlir][Tensor] Retain discardable attrs back in cast(pack) folder~~ [mlir][Tensor] Retain discardable attrs in pack(cast) folder Nov 11, 2024

MaheshRavishankar approved these changes Nov 12, 2024

View reviewed changes

stellaraccident approved these changes Nov 12, 2024

View reviewed changes

qedawkins merged commit 24c2c74 into llvm:main Nov 12, 2024
11 checks passed

qedawkins deleted the keep_pack_discardable_attrs branch November 12, 2024 13:19

bondhugula reviewed Nov 13, 2024

View reviewed changes

Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024

[mlir][Tensor] Retain discardable attrs in pack(cast) folder (llvm#11…

9a9c909

…5772)

banach-space mentioned this pull request Dec 31, 2024

[mlir][tensor] Introduce FoldTensorCastUnPackOp #121393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

qedawkins commented Nov 11, 2024

llvmbot commented Nov 11, 2024 •

edited

Loading

joker-eph Nov 11, 2024 •

edited

Loading

qedawkins Nov 11, 2024

joker-eph Nov 11, 2024

MaheshRavishankar Nov 12, 2024

stellaraccident Nov 12, 2024

stellaraccident Nov 12, 2024

joker-eph Nov 12, 2024

stellaraccident Nov 12, 2024

joker-eph Nov 12, 2024

stellaraccident Nov 12, 2024 •

edited

Loading

bondhugula Nov 13, 2024

stellaraccident Nov 13, 2024

bondhugula Nov 13, 2024 •

edited

Loading

MaheshRavishankar Nov 13, 2024

matthias-springer Nov 13, 2024

bondhugula left a comment

banach-space commented Nov 13, 2024

MaheshRavishankar commented Nov 13, 2024

joker-eph commented Nov 13, 2024 •

edited

Loading

joker-eph commented Nov 13, 2024 •

edited

Loading

MaheshRavishankar commented Nov 13, 2024

joker-eph commented Nov 13, 2024

MaheshRavishankar commented Nov 14, 2024

joker-eph commented Nov 14, 2024

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

Conversation

qedawkins commented Nov 11, 2024

llvmbot commented Nov 11, 2024 • edited Loading

joker-eph Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stellaraccident Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bondhugula Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bondhugula left a comment

Choose a reason for hiding this comment

banach-space commented Nov 13, 2024

MaheshRavishankar commented Nov 13, 2024

joker-eph commented Nov 13, 2024 • edited Loading

joker-eph commented Nov 13, 2024 • edited Loading

MaheshRavishankar commented Nov 13, 2024

joker-eph commented Nov 13, 2024

MaheshRavishankar commented Nov 14, 2024

joker-eph commented Nov 14, 2024

llvmbot commented Nov 11, 2024 •

edited

Loading

joker-eph Nov 11, 2024 •

edited

Loading

stellaraccident Nov 12, 2024 •

edited

Loading

bondhugula Nov 13, 2024 •

edited

Loading

joker-eph commented Nov 13, 2024 •

edited

Loading

joker-eph commented Nov 13, 2024 •

edited

Loading