-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-mlir-tensor @llvm/pr-subscribers-mlir Author: Quinn Dawkins (qedawkins) ChangesFull diff: https://github.com/llvm/llvm-project/pull/115772.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 8e0d0104397468..147120e0e34203 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
PackOp newOp = rewriter.create<PackOp>(
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
+ newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
// Replace op.
Value oldResult = op.getResult();
diff --git a/mlir/test/Dialect/Tensor/canonicalize.mlir b/mlir/test/Dialect/Tensor/canonicalize.mlir
index 2c826d7ae008d5..0b54c207dea84e 100644
--- a/mlir/test/Dialect/Tensor/canonicalize.mlir
+++ b/mlir/test/Dialect/Tensor/canonicalize.mlir
@@ -2750,7 +2750,10 @@ func.func @fold_cast_multiple_results(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2x
// CHECK-SAME: %[[DEST:.*]]: tensor<1x1x8x1xi32>,
// CHECK-SAME: %[[SRC:.*]]: tensor<7x?xi32>,
// CHECK-SAME: %[[PAD:.*]]: i32) -> tensor<1x1x8x1xi32> {
-// CHECK: %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32) inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]] : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
+// CHECK: %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32)
+// CHECK-SAME: inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]]
+// CHECK-SAME: some_attr
+// CHECK-SAME: : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
// CHECK: return %[[PACK]] : tensor<1x1x8x1xi32>
func.func @fold_cast_pack_dynamic_tile_size(
%dest: tensor<1x1x8x1xi32>,
@@ -2759,7 +2762,10 @@ func.func @fold_cast_pack_dynamic_tile_size(
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
%c8 = arith.constant 8 : index
- %pack = tensor.pack %src padding_value(%pad : i32) inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
+ %pack = tensor.pack %src padding_value(%pad : i32)
+ inner_dims_pos = [0, 1]
+ inner_tiles = [%c8, 1]
+ into %cast {some_attr} : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
%res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
return %res : tensor<1x1x8x1xi32>
}
|
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> { | |||
PackOp newOp = rewriter.create<PackOp>( | |||
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(), | |||
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm()); | |||
newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this sound? How do you guarantee that the pattern can't invalidate anything encoded in there?
We don't have a principled approach to this in general unfortunately, here the pattern replace the op so it was visible, other folders operate in-place and are propagating "by accident".
By the way: can't this pattern just operate in-place here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how easy it would be to make it operate in place because it is potentially updating a list of mixed operands (newMixedTileSizes
) meaning operand segment lengths need to be updated as well and I've had a bad experience trying to do that in the past.
Why is this sounds? How do you guarantee that the pattern can't invalidate anything encoded in there?
There is no guarantee this is sound in the same way any pattern that operates on generic tensor encodings isn't sound. As I see it there are hardly any "globally sound" decisions we can make about arbitrary unstructured attributes so instead it seems as though people have been going for a practical middle ground. This is also consistent with every other cast folder I've seen (not saying there aren't counter examples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason these are called "discardable" is because we are expected to discard them when we don't understand them / can't update them in transformations!
So the principled approach would be to be conservative: either don't perform the transformation if you don't understand the attributes, or drop them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know if it can be inplace. The result type is changing I think.
I think the working assumption is that preserving these attributes are, at best, best effort. You cant always transfer over attributes, so its safe to discard. But in cases like this when the op is being replaced by an op of the same type, its also safe to carry it over. If someone relies on this attribute thats "at your risk", but best effort to preserve the attribute. Its not the most principled thing, but more a pragmatic thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many things that are, let's be positive and call them "quirky", about this dialect. If this were its own project, the approach to attributes for things at this layer would be top of the list of things to document. The very next thing I would do is likely remove most of the canonicalizers and sort them into things that could be documented and used in a targeted way. It isn't even clear to me that the classic definitions even apply to things at this layer. Lots of work, though, and in the absence of it, consistency and pragmatism is a higher valued commodity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there are satisfying answers in the current state, and we're not going to keep discussing misalignments of the core principles with realities of these dialects.
We value consistency with the rest of the dialect over a solution that no one seems able to articulate.
Thank you for your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but you can't just discard my comments like this. Please consider these as blocking discussions right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I don't see how you have the standing on this project to say that. Your feedback and insights are of course welcome, but the people with the miles in on these dialects need to arbitrate the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what qualifies you to make this call right now, I don't recognize you this authority, and nothing in our practices or documented policies on code reviews support your behavior here.
The consistency of MLIR is important, and as long as this code is under mlir/....
, I consider qualified to make these comments and totally expect to be taken seriously. I also consider myself fully qualify to revert any patch which lands in these conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments are of course welcome, but you don't have any more standing for MLIR as a whole than I do. In those situations, when there are clearly active contributors to an area processing feedback, it is in their prerogative to make decisions. In this case I happen to know that, @rengolin has a plan for these ops that will be a wholly better situation, and @MaheshRavishankar and @qedawkins making pragmatic choices that smooth paths while things unfold is what we expect of maintainers. Which they are, by the same conventions that always arbitrate in the absence of established governance: the people who have the miles in for an area need to be able to move things forward and make decisions.
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> { | |||
PackOp newOp = rewriter.create<PackOp>( | |||
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(), | |||
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm()); | |||
newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see other folding patterns in this dialect propagating external attributes. A change like this should have a comment/rationale at the least to start with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale stated claimed in the comments was that in the prior state, the folders were operating in place, which implicitly preserves these attributes. Regardless of whether this is the best kind of assumption to have, it is an inconsistent and unintended state to have this be different just because this pattern does not do an in place modification.
Agreed, that should have made it to the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are numerous patterns in this file (folding or otherwise) that aren't doing in-place modifications (including tensor.unpack) -- they aren't propagating external attributes either.
$ grep replaceOpWithNewOp lib/Dialect/Tensor/IR/TensorOps.cpp | wc -l
24
# (not even counting create<...> + replaceOp)
4682 Value newOp = rewriter.create<tensor::UnPackOp>(
4683 loc, source, dest, unPackOp.getInnerDimsPos(), unPackOp.getMixedTiles(),
4684 unPackOp.getOuterDimsPerm());
4685 rewriter.replaceOpWithNewOp<tensor::CastOp>(
4686 unPackOp, unPackOp.getResult().getType(), newOp);
4687 return success();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused as to what then is the point of external attributes or discardable attributes then.
I agree this isn't consistent. Like I said it's best effort.
I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think commit summaries on such changes can be left blank. The title itself isn't sufficient.
I see this as a follow-up to the recent change I made in #114559. From what I can tell, before my change, the behavior introduced explicitly here was already present implicitly via the clone method, which preserved these attributes (*). So, is this PR effectively restoring the previous behaviour? To me, this feels like a temporary solution rather than a scalable one. I agree with others that preserving discardable attributes seems somewhat ad hoc. My recent PR may have uncovered this issue, and I understand that this approach is a quick patch. Sometimes, this kind of fix is indeed what’s needed to unblock people while a sounder approach is discussed.
Sure, the Op transformation seems fine. But what about the attribute itself? Preserving it happens to be OK for the particular attribute in your use case (I don't know what that is), but may not make sense for other attributes that somebody else may want to attach. (*) My understanding is that it clones "inherent" rather than "discardable" attributes, but the underlying storage seems to be shared between the two? |
Pass boundaries are completely arbitrary. I can do whatever in a pass (including launching new pipelines). So defining the attributes as something that does not leak across passes is completely arbitrary. Even with that definition, if I had two separate pattern application steps and I set an attribute before the first step it would be dropped by the next step. I think we are missing that it is really useful for a compiler stack to have some way of annotating operations. This is very closely related to LLVM metadata, which from my working experience has a lot of similar issues, but is a tool that is used. It is not necessarily the most principled thing but is a pragmatic way to annotate the IR with some information that can be used by subsequent optimizations. This is literally the use case here. In my book, the real issue here is that this transformation could have been done in-place, in which case there would not be an issue. We dont need to dictate how the whole world builds compilers. A lot of time you need to make pragmatic choice to connect things, and as long as it isnt disruptive to all downstream users this should be acceptable. Literally no one is affected by this change. |
I don't believe so: a correct and sound transformation in-place would also explicitly drop these attributes.
Indeed: you do whatever you want downstream, but we have a consistent project upstream where we need to adhere to some principled and consistent designs.
This is not true: anyone which was annotating such an op would be affected because we would preserve incorrectly an attribute, potentially leading to a crash or worse: a silent miscompile. |
Note that the project stance on discardable attributes was debated quite a few times, some of the pointers:
They were not initially called like this by the way, it's after one of these discussions that we felt the need to be explicit about the intent and picked the term "discardable": https://reviews.llvm.org/D96093 |
Here is the problem. We have two cases
We cant serve both cases for sure. This is really a missing feature in MLIR. An ability for an optimization to leave information for subsequent optimizations to pick up. It is fragile but this is required in many cases. This is the reason why we use it (some information is more easy to "compute" earlier in the pipeline than at the point of transformation). This is also the same reason I have seen metadata being used in LLVM. There isnt support for this feature. If MLIR had this feature it would make everyones life much easier.
On the same thread https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/5?u=maheshravishankar . Like I said ties back to metadata. There is no good answer here. Best way IMO is to take it on a case-by-case basis and proceed based on use case. I dont totally disagree with the comments about the pitfalls here... but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong. This goes back to, there is a missing feature here, in the absence of which people reach for the only path that allows them to do this. |
To be clear: an optimization can do that. You can define some discardable attributes that the optimizations knows about and knows how to preserve through transformations, that's perfectly fine. But this isn't the same as "blind" propagation through transformations.
I don't think so: one is a miscompile / correctness problem, the other is a conservative / safe options. These are called discardable attributes for a reason, and the reason is so that the options are absolutely not "equally wrong". |
I am not sure I follow that... but maybe thats a tangent. On the same thread Chris says https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/10?u=maheshravishankar . Specifically
We seem to have stopped at discardable attributes and have not added "preserved attributes". The transformation does not need to worry about the semantics of these preserved attributes. In cases where it is effectively cloning the operation, you can clone the preserved attributes. If these attributes are such that the meaning changes, well that is a burden on downstream users to adapt accordingly. |
Did you look at this link I posted above? https://discourse.llvm.org/t/rfc-implicit-propagation-of-dialect-attributes-best-effort/2657/
This is not how the system is designed, they should be dropped by default, it's not downstream problem. What you describe is something that could have been done, it is just not what is there with discardable attributes. |
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This change mirrors a similar update made for `tensor::PackOp` in llvm#114559. Below is the updated rationale for `tensor::UnPackOp`. Currently, `FoldTensorCastProducerOp` incorrectly folds the following: ```mlir %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32> %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` as: ```mlir %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` This leads to an Op verification failure because the folder does not update the inner tile sizes in the unpack Op. This patch resolves the issue. Additional Changes: * invalid.mlir: Fixes a typo. * TensorOps.cpp: Removes unnecessary `(void)tileSize` and adds extra comments following this discussion: llvm#115772.
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a similar update made for `tensor::PackOp` in #114559. Below is the updated rationale tailored to `tensor::UnPackOp`. ISSUE DESCRIPTION Currently, `FoldTensorCastProducerOp` incorrectly folds the following: ```mlir %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32> // Note: `%c8` and `?`. %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32> ``` as: ```mlir // Note: `%c8` and `8`. %unpack = tensor.unpack %cast inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32> ``` This triggers an Op verification failure because the folder does not update the inner tile sizes in the unpack Op. This patch addresses the issue by ensuring proper handling of inner tile sizes. ADDITIONAL CHANGES * invalid.mlir: Fixed a typo. * TensorOps.cpp: * Removed unnecessary `(void)tileSize`. * Added comments following the discussion in PR #115772. * Made minor updates to `FoldTensorCastPackOp` for consistency with the newly introduced `FoldTensorCastUnPackOp`. * Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g., replaced mixed use of `test_attr` and `some_attr`).
No description provided.