Skip to content

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

qedawkins
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-mlir-tensor

@llvm/pr-subscribers-mlir

Author: Quinn Dawkins (qedawkins)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/115772.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+1)
  • (modified) mlir/test/Dialect/Tensor/canonicalize.mlir (+8-2)
diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 8e0d0104397468..147120e0e34203 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
     PackOp newOp = rewriter.create<PackOp>(
         op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
         newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
+    newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
 
     // Replace op.
     Value oldResult = op.getResult();
diff --git a/mlir/test/Dialect/Tensor/canonicalize.mlir b/mlir/test/Dialect/Tensor/canonicalize.mlir
index 2c826d7ae008d5..0b54c207dea84e 100644
--- a/mlir/test/Dialect/Tensor/canonicalize.mlir
+++ b/mlir/test/Dialect/Tensor/canonicalize.mlir
@@ -2750,7 +2750,10 @@ func.func @fold_cast_multiple_results(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2x
 // CHECK-SAME:      %[[DEST:.*]]: tensor<1x1x8x1xi32>,
 // CHECK-SAME:      %[[SRC:.*]]: tensor<7x?xi32>,
 // CHECK-SAME:      %[[PAD:.*]]: i32) -> tensor<1x1x8x1xi32> {
-// CHECK:           %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32) inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]] : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
+// CHECK:           %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32)
+// CHECK-SAME:        inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]]
+// CHECK-SAME:        some_attr
+// CHECK-SAME:        : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
 // CHECK:           return %[[PACK]] : tensor<1x1x8x1xi32>
 func.func @fold_cast_pack_dynamic_tile_size(
   %dest: tensor<1x1x8x1xi32>,
@@ -2759,7 +2762,10 @@ func.func @fold_cast_pack_dynamic_tile_size(
 
     %cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
     %c8 = arith.constant 8 : index
-    %pack = tensor.pack %src padding_value(%pad : i32) inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
+    %pack = tensor.pack %src padding_value(%pad : i32)
+      inner_dims_pos = [0, 1]
+      inner_tiles = [%c8, 1]
+      into %cast {some_attr} : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
     %res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
     return %res : tensor<1x1x8x1xi32>
 }

@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
PackOp newOp = rewriter.create<PackOp>(
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
Copy link
Collaborator

@joker-eph joker-eph Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this sound? How do you guarantee that the pattern can't invalidate anything encoded in there?

We don't have a principled approach to this in general unfortunately, here the pattern replace the op so it was visible, other folders operate in-place and are propagating "by accident".

By the way: can't this pattern just operate in-place here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how easy it would be to make it operate in place because it is potentially updating a list of mixed operands (newMixedTileSizes) meaning operand segment lengths need to be updated as well and I've had a bad experience trying to do that in the past.

Why is this sounds? How do you guarantee that the pattern can't invalidate anything encoded in there?

There is no guarantee this is sound in the same way any pattern that operates on generic tensor encodings isn't sound. As I see it there are hardly any "globally sound" decisions we can make about arbitrary unstructured attributes so instead it seems as though people have been going for a practical middle ground. This is also consistent with every other cast folder I've seen (not saying there aren't counter examples).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason these are called "discardable" is because we are expected to discard them when we don't understand them / can't update them in transformations!

So the principled approach would be to be conservative: either don't perform the transformation if you don't understand the attributes, or drop them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if it can be inplace. The result type is changing I think.

I think the working assumption is that preserving these attributes are, at best, best effort. You cant always transfer over attributes, so its safe to discard. But in cases like this when the op is being replaced by an op of the same type, its also safe to carry it over. If someone relies on this attribute thats "at your risk", but best effort to preserve the attribute. Its not the most principled thing, but more a pragmatic thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many things that are, let's be positive and call them "quirky", about this dialect. If this were its own project, the approach to attributes for things at this layer would be top of the list of things to document. The very next thing I would do is likely remove most of the canonicalizers and sort them into things that could be documented and used in a targeted way. It isn't even clear to me that the classic definitions even apply to things at this layer. Lots of work, though, and in the absence of it, consistency and pragmatism is a higher valued commodity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there are satisfying answers in the current state, and we're not going to keep discussing misalignments of the core principles with realities of these dialects.

We value consistency with the rest of the dialect over a solution that no one seems able to articulate.

Thank you for your feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but you can't just discard my comments like this. Please consider these as blocking discussions right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I don't see how you have the standing on this project to say that. Your feedback and insights are of course welcome, but the people with the miles in on these dialects need to arbitrate the result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what qualifies you to make this call right now, I don't recognize you this authority, and nothing in our practices or documented policies on code reviews support your behavior here.

The consistency of MLIR is important, and as long as this code is under mlir/...., I consider qualified to make these comments and totally expect to be taken seriously. I also consider myself fully qualify to revert any patch which lands in these conditions.

Copy link
Contributor

@stellaraccident stellaraccident Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comments are of course welcome, but you don't have any more standing for MLIR as a whole than I do. In those situations, when there are clearly active contributors to an area processing feedback, it is in their prerogative to make decisions. In this case I happen to know that, @rengolin has a plan for these ops that will be a wholly better situation, and @MaheshRavishankar and @qedawkins making pragmatic choices that smooth paths while things unfold is what we expect of maintainers. Which they are, by the same conventions that always arbitrate in the absence of established governance: the people who have the miles in for an area need to be able to move things forward and make decisions.

@qedawkins qedawkins changed the title [mlir][Tensor] Retain discardable attrs back in cast(pack) folder [mlir][Tensor] Retain discardable attrs in pack(cast) folder Nov 11, 2024
@qedawkins qedawkins merged commit 24c2c74 into llvm:main Nov 12, 2024
11 checks passed
@qedawkins qedawkins deleted the keep_pack_discardable_attrs branch November 12, 2024 13:19
@@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
PackOp newOp = rewriter.create<PackOp>(
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see other folding patterns in this dialect propagating external attributes. A change like this should have a comment/rationale at the least to start with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale stated claimed in the comments was that in the prior state, the folders were operating in place, which implicitly preserves these attributes. Regardless of whether this is the best kind of assumption to have, it is an inconsistent and unintended state to have this be different just because this pattern does not do an in place modification.

Agreed, that should have made it to the commit message.

Copy link
Contributor

@bondhugula bondhugula Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are numerous patterns in this file (folding or otherwise) that aren't doing in-place modifications (including tensor.unpack) -- they aren't propagating external attributes either.

$ grep replaceOpWithNewOp lib/Dialect/Tensor/IR/TensorOps.cpp | wc -l
24
# (not even counting create<...> + replaceOp)
  4682     Value newOp = rewriter.create<tensor::UnPackOp>(
  4683         loc, source, dest, unPackOp.getInnerDimsPos(), unPackOp.getMixedTiles(),
  4684         unPackOp.getOuterDimsPerm());
  4685     rewriter.replaceOpWithNewOp<tensor::CastOp>(
  4686         unPackOp, unPackOp.getResult().getType(), newOp);
  4687     return success();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused as to what then is the point of external attributes or discardable attributes then.
I agree this isn't consistent. Like I said it's best effort.

I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.

Copy link
Contributor

@bondhugula bondhugula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think commit summaries on such changes can be left blank. The title itself isn't sufficient.

@banach-space
Copy link
Contributor

I see this as a follow-up to the recent change I made in #114559.

From what I can tell, before my change, the behavior introduced explicitly here was already present implicitly via the clone method, which preserved these attributes (*). So, is this PR effectively restoring the previous behaviour?

To me, this feels like a temporary solution rather than a scalable one. I agree with others that preserving discardable attributes seems somewhat ad hoc. My recent PR may have uncovered this issue, and I understand that this approach is a quick patch. Sometimes, this kind of fix is indeed what’s needed to unblock people while a sounder approach is discussed.

I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.

Sure, the Op transformation seems fine. But what about the attribute itself? Preserving it happens to be OK for the particular attribute in your use case (I don't know what that is), but may not make sense for other attributes that somebody else may want to attach.

(*) My understanding is that it clones "inherent" rather than "discardable" attributes, but the underlying storage seems to be shared between the two?

@MaheshRavishankar
Copy link
Contributor

Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.

Pass boundaries are completely arbitrary. I can do whatever in a pass (including launching new pipelines). So defining the attributes as something that does not leak across passes is completely arbitrary. Even with that definition, if I had two separate pattern application steps and I set an attribute before the first step it would be dropped by the next step.

I think we are missing that it is really useful for a compiler stack to have some way of annotating operations. This is very closely related to LLVM metadata, which from my working experience has a lot of similar issues, but is a tool that is used. It is not necessarily the most principled thing but is a pragmatic way to annotate the IR with some information that can be used by subsequent optimizations. This is literally the use case here. In my book, the real issue here is that this transformation could have been done in-place, in which case there would not be an issue.

We dont need to dictate how the whole world builds compilers. A lot of time you need to make pragmatic choice to connect things, and as long as it isnt disruptive to all downstream users this should be acceptable. Literally no one is affected by this change.

@joker-eph
Copy link
Collaborator

joker-eph commented Nov 13, 2024

This is literally the use case here. In my book, the real issue here is that this transformation could have been done in-place, in which case there would not be an issue.

I don't believe so: a correct and sound transformation in-place would also explicitly drop these attributes.

We dont need to dictate how the whole world builds compilers.

Indeed: you do whatever you want downstream, but we have a consistent project upstream where we need to adhere to some principled and consistent designs.

Literally no one is affected by this change.

This is not true: anyone which was annotating such an op would be affected because we would preserve incorrectly an attribute, potentially leading to a crash or worse: a silent miscompile.

@joker-eph
Copy link
Collaborator

joker-eph commented Nov 13, 2024

Note that the project stance on discardable attributes was debated quite a few times, some of the pointers:

They were not initially called like this by the way, it's after one of these discussions that we felt the need to be explicit about the intent and picked the term "discardable": https://reviews.llvm.org/D96093

@MaheshRavishankar
Copy link
Contributor

This is not true: anyone which was annotating such an op would be affected because we would preserve incorrectly an attribute, potentially leading to a crash or worse: a silent miscompile.

Here is the problem. We have two cases

  1. There is an attribute that is added that gets dropped when it is being relied upon for a subsequent transformation/pass. If we drop it then it is hard to recover this.
  2. A attribute was added that is supposed to be dropped and subsequent transformation/pass relies on it being dropped. I would think this would be easier to drop the attribute prescriptively than expecting it to happen automatically.

We cant serve both cases for sure. This is really a missing feature in MLIR. An ability for an optimization to leave information for subsequent optimizations to pick up. It is fragile but this is required in many cases. This is the reason why we use it (some information is more easy to "compute" earlier in the pipeline than at the point of transformation). This is also the same reason I have seen metadata being used in LLVM. There isnt support for this feature. If MLIR had this feature it would make everyones life much easier.

On the same thread https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/5?u=maheshravishankar . Like I said ties back to metadata. There is no good answer here. Best way IMO is to take it on a case-by-case basis and proceed based on use case. I dont totally disagree with the comments about the pitfalls here... but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong. This goes back to, there is a missing feature here, in the absence of which people reach for the only path that allows them to do this.

@joker-eph
Copy link
Collaborator

An ability for an optimization to leave information for subsequent optimizations to pick up.

To be clear: an optimization can do that. You can define some discardable attributes that the optimizations knows about and knows how to preserve through transformations, that's perfectly fine. But this isn't the same as "blind" propagation through transformations.

but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong

I don't think so: one is a miscompile / correctness problem, the other is a conservative / safe options. These are called discardable attributes for a reason, and the reason is so that the options are absolutely not "equally wrong".

@MaheshRavishankar
Copy link
Contributor

but I dont see why just drop the attribute is any better compared to just keep the attribute. Both are equally wrong

I don't think so: one is a miscompile / correctness problem, the other is a conservative / safe options. These are called discardable attributes for a reason, and the reason is so that the options are absolutely not "equally wrong".

I am not sure I follow that... but maybe thats a tangent. On the same thread Chris says https://discourse.llvm.org/t/on-querying-an-operations-intrinsic-core-vs-external-user-defined-attributes/4498/10?u=maheshravishankar . Specifically

We can define a schema for attributes, so the attributes themselves carry information about “I am safe to implicitly propagate”. We would nail down what that exactly means, but if we had that, you could put this bit on the attributes in question and we could make the infra look for it and propagate “just these” attributes.

We seem to have stopped at discardable attributes and have not added "preserved attributes". The transformation does not need to worry about the semantics of these preserved attributes. In cases where it is effectively cloning the operation, you can clone the preserved attributes. If these attributes are such that the meaning changes, well that is a burden on downstream users to adapt accordingly.

@joker-eph
Copy link
Collaborator

We seem to have stopped at discardable attributes and have not added "preserved attributes". The transformation does not need to worry about the semantics of these preserved attributes.

Did you look at this link I posted above? https://discourse.llvm.org/t/rfc-implicit-propagation-of-dialect-attributes-best-effort/2657/
We never found a path forward for "preserved attributes", but we tried for some time.

If these attributes are such that the meaning changes, well that is a burden on downstream users to adapt accordingly.

This is not how the system is designed, they should be dropped by default, it's not downstream problem. What you describe is something that could have been done, it is just not what is there with discardable attributes.

Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
banach-space added a commit to banach-space/llvm-project that referenced this pull request Dec 31, 2024
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This change
mirrors a similar update made for `tensor::PackOp` in llvm#114559. Below is
the updated rationale for `tensor::UnPackOp`.

Currently, `FoldTensorCastProducerOp` incorrectly folds the following:

```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:

```mlir
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```

This leads to an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch resolves the
issue.

Additional Changes:
* invalid.mlir: Fixes a typo.
* TensorOps.cpp: Removes unnecessary `(void)tileSize` and adds extra
  comments following this discussion:
  llvm#115772.
banach-space added a commit that referenced this pull request Jan 3, 2025
This patch specializes `FoldTensorCastProducerOp` for `tensor::UnPackOp` by
introducing a dedicated pattern: `FoldTensorCastUnPackOp`. This mirrors a
similar update made for `tensor::PackOp` in #114559. Below is the updated
rationale tailored to `tensor::UnPackOp`.

ISSUE DESCRIPTION

Currently, `FoldTensorCastProducerOp` incorrectly folds the following:

```mlir
%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
// Note: `%c8` and `?`.
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x?x1xi32> -> tensor<7x?xi32>
```
as:

```mlir
// Note: `%c8` and `8`.
%unpack = tensor.unpack %cast
  inner_dims_pos = [0, 1]
  inner_tiles = [%c8, 1]
  into %res : tensor<1x1x8x1xi32> -> tensor<7x?xi32>
```
This triggers an Op verification failure because the folder does not
update the inner tile sizes in the unpack Op. This patch addresses the
issue by ensuring proper handling of inner tile sizes.

ADDITIONAL CHANGES

* invalid.mlir: Fixed a typo.
* TensorOps.cpp:
  * Removed unnecessary `(void)tileSize`.
  * Added comments following the discussion in PR #115772.
  * Made minor updates to `FoldTensorCastPackOp` for consistency with the newly
    introduced `FoldTensorCastUnPackOp`.
* Tensor/canonicalize.mlir: Ensured consistent usage of `test_attr` (e.g.,
  replaced mixed use of `test_attr` and `some_attr`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants