Skip to content

[mlir][Tensor] Retain discardable attrs in pack(cast) folder #115772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4798,6 +4798,7 @@ struct FoldTensorCastPackOp : public OpRewritePattern<PackOp> {
PackOp newOp = rewriter.create<PackOp>(
op.getLoc(), newOperands[0], newOperands[1], op.getInnerDimsPos(),
newMixedTileSizes, op.getPaddingValue(), op.getOuterDimsPerm());
newOp->setDiscardableAttrs(op->getDiscardableAttrDictionary());
Copy link
Collaborator

@joker-eph joker-eph Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this sound? How do you guarantee that the pattern can't invalidate anything encoded in there?

We don't have a principled approach to this in general unfortunately, here the pattern replace the op so it was visible, other folders operate in-place and are propagating "by accident".

By the way: can't this pattern just operate in-place here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how easy it would be to make it operate in place because it is potentially updating a list of mixed operands (newMixedTileSizes) meaning operand segment lengths need to be updated as well and I've had a bad experience trying to do that in the past.

Why is this sounds? How do you guarantee that the pattern can't invalidate anything encoded in there?

There is no guarantee this is sound in the same way any pattern that operates on generic tensor encodings isn't sound. As I see it there are hardly any "globally sound" decisions we can make about arbitrary unstructured attributes so instead it seems as though people have been going for a practical middle ground. This is also consistent with every other cast folder I've seen (not saying there aren't counter examples).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason these are called "discardable" is because we are expected to discard them when we don't understand them / can't update them in transformations!

So the principled approach would be to be conservative: either don't perform the transformation if you don't understand the attributes, or drop them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if it can be inplace. The result type is changing I think.

I think the working assumption is that preserving these attributes are, at best, best effort. You cant always transfer over attributes, so its safe to discard. But in cases like this when the op is being replaced by an op of the same type, its also safe to carry it over. If someone relies on this attribute thats "at your risk", but best effort to preserve the attribute. Its not the most principled thing, but more a pragmatic thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many things that are, let's be positive and call them "quirky", about this dialect. If this were its own project, the approach to attributes for things at this layer would be top of the list of things to document. The very next thing I would do is likely remove most of the canonicalizers and sort them into things that could be documented and used in a targeted way. It isn't even clear to me that the classic definitions even apply to things at this layer. Lots of work, though, and in the absence of it, consistency and pragmatism is a higher valued commodity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there are satisfying answers in the current state, and we're not going to keep discussing misalignments of the core principles with realities of these dialects.

We value consistency with the rest of the dialect over a solution that no one seems able to articulate.

Thank you for your feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but you can't just discard my comments like this. Please consider these as blocking discussions right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I don't see how you have the standing on this project to say that. Your feedback and insights are of course welcome, but the people with the miles in on these dialects need to arbitrate the result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what qualifies you to make this call right now, I don't recognize you this authority, and nothing in our practices or documented policies on code reviews support your behavior here.

The consistency of MLIR is important, and as long as this code is under mlir/...., I consider qualified to make these comments and totally expect to be taken seriously. I also consider myself fully qualify to revert any patch which lands in these conditions.

Copy link
Contributor

@stellaraccident stellaraccident Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comments are of course welcome, but you don't have any more standing for MLIR as a whole than I do. In those situations, when there are clearly active contributors to an area processing feedback, it is in their prerogative to make decisions. In this case I happen to know that, @rengolin has a plan for these ops that will be a wholly better situation, and @MaheshRavishankar and @qedawkins making pragmatic choices that smooth paths while things unfold is what we expect of maintainers. Which they are, by the same conventions that always arbitrate in the absence of established governance: the people who have the miles in for an area need to be able to move things forward and make decisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see other folding patterns in this dialect propagating external attributes. A change like this should have a comment/rationale at the least to start with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale stated claimed in the comments was that in the prior state, the folders were operating in place, which implicitly preserves these attributes. Regardless of whether this is the best kind of assumption to have, it is an inconsistent and unintended state to have this be different just because this pattern does not do an in place modification.

Agreed, that should have made it to the commit message.

Copy link
Contributor

@bondhugula bondhugula Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are numerous patterns in this file (folding or otherwise) that aren't doing in-place modifications (including tensor.unpack) -- they aren't propagating external attributes either.

$ grep replaceOpWithNewOp lib/Dialect/Tensor/IR/TensorOps.cpp | wc -l
24
# (not even counting create<...> + replaceOp)
  4682     Value newOp = rewriter.create<tensor::UnPackOp>(
  4683         loc, source, dest, unPackOp.getInnerDimsPos(), unPackOp.getMixedTiles(),
  4684         unPackOp.getOuterDimsPerm());
  4685     rewriter.replaceOpWithNewOp<tensor::CastOp>(
  4686         unPackOp, unPackOp.getResult().getType(), newOp);
  4687     return success();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused as to what then is the point of external attributes or discardable attributes then.
I agree this isn't consistent. Like I said it's best effort.

I think the more important thing here is that this doesn't change the semantics of the transformation one bit from upstreams perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discardable attributes are safe to use within a pass, but not across pass boundaries. E.g., I remember we had "filter" attributes in Linalg at some point. (Maybe still have them.) But we had to make sure to drop them again at the end of the pass (if they were still there), so they don't leak across pass boundaries.


// Replace op.
Value oldResult = op.getResult();
Expand Down
10 changes: 8 additions & 2 deletions mlir/test/Dialect/Tensor/canonicalize.mlir
Original file line number Diff line number Diff line change
Expand Up @@ -2750,7 +2750,10 @@ func.func @fold_cast_multiple_results(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2x
// CHECK-SAME: %[[DEST:.*]]: tensor<1x1x8x1xi32>,
// CHECK-SAME: %[[SRC:.*]]: tensor<7x?xi32>,
// CHECK-SAME: %[[PAD:.*]]: i32) -> tensor<1x1x8x1xi32> {
// CHECK: %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32) inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]] : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
// CHECK: %[[PACK:.*]] = tensor.pack %[[SRC]] padding_value(%[[PAD]] : i32)
// CHECK-SAME: inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %[[DEST]]
// CHECK-SAME: some_attr
// CHECK-SAME: : tensor<7x?xi32> -> tensor<1x1x8x1xi32>
// CHECK: return %[[PACK]] : tensor<1x1x8x1xi32>
func.func @fold_cast_pack_dynamic_tile_size(
%dest: tensor<1x1x8x1xi32>,
Expand All @@ -2759,7 +2762,10 @@ func.func @fold_cast_pack_dynamic_tile_size(

%cast = tensor.cast %dest : tensor<1x1x8x1xi32> to tensor<1x1x?x1xi32>
%c8 = arith.constant 8 : index
%pack = tensor.pack %src padding_value(%pad : i32) inner_dims_pos = [0, 1] inner_tiles = [%c8, 1] into %cast : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
%pack = tensor.pack %src padding_value(%pad : i32)
inner_dims_pos = [0, 1]
inner_tiles = [%c8, 1]
into %cast {some_attr} : tensor<7x?xi32> -> tensor<1x1x?x1xi32>
%res = tensor.cast %pack : tensor<1x1x?x1xi32> to tensor<1x1x8x1xi32>
return %res : tensor<1x1x8x1xi32>
}
Expand Down
Loading