-
Notifications
You must be signed in to change notification settings - Fork 536
Qualcomm AI Engine Direct - alias_copy op #10319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10319
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 24f8528 with merge base 17cbef5 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work for me
def test_qnn_backend_alias(self): | ||
module = Alias() # noqa: F405 | ||
sample_input = (torch.randn(1, 10),) | ||
self.lower_module_and_test_output(module, sample_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we testing alias op should still work, meaning they're not removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cccclai,
Thanks for reviewing the PR.
Alias op will be removed.
This test is just to show alias can be properly removed and model is still working properly.
I have added some comments under the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I assume this test will work regardless with or without alias_op, correct? I was thinking export the graph and run to_backend, and check there is no alias op after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I assume this test will work regardless with or without alias_op, correct? I was thinking export the graph and run to_backend, and check there is no alias op after that.
This test it to reproduce @billmguo error.
Without adding exir_ops.edge.aten.alias_copy.default
to RemoveRedundancy
pass, this test will fail during qnn_partitioner where it does not have op_builder for aten.alias_copy.default
.
Thanks for the suggestion. That will probably be more straight forward as the unit test is not actually running the alias_op since it is dropped during RemoveRedundancy
. Maybe we can add a new class for UT, targeting whether passes are working as expected.
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
It seems like the test is failing
|
Thanks for sharing the results. I think this is failing by chances. I tested locally a couple times, and it all passed. I have changed to a simpler OP instead since the purpose here is to verify alias_copy is handled properly. |
Actually the failing might due to this change #10362, can you help checking if the test is still passing with pytorch/pytorch#151436? |
I have been looking at the failure and still haven't made much progress. It looks like qualcomm passes do something special to exported artifact state dicts down the line. Is it accurate? Can you point to me places where you deal with module state dicts? |
Hi @cccclai, @tugsbayasgalan, Yes I am able to reproduce the issue after bumping the torch nightly version to "dev20250422". I believe the reason is that we are trying to convert the quant weights back to floating point during
However, after the bumping the torch version, it seems like the FP weights are not properly saved. We are using the following functions to get and set parameters.
It would be appreciated if you could share the best practices on handling parameters modification. |
I am still confused how this logic could break your flow tho. Before returning state dict, i just shallow copy in export meaning they should be the same state dict lol. In what part of set_parameter does it fail? |
I think I might know why it is not updating, but I need some time to verify if it is actually the root cause as I am currently blocked by something else.
Next, during the lowering process, when decomposition is called, we get a new exported_program with a shallow copy new_state_dict. https://github.com/pytorch/pytorch/blob/ad81eeb7c7c906e0cdd04a5cc8fdb9592281c317/torch/export/exported_program.py#L880 Then, when we are running the passes, we are updating our values to old state_dict in the old exported_program. This is why we keep on getting incorrect params during QNN op builder. Please let me know if there's anything that is still unclear. |
Ahh this makes sense. You should always work with state_dict of ep after running decompositions because the state dict can change after decompositions. |
@winskuo-quic Just checking in, will you be working on making the qualcomm code compatible with new export behaviour? |
Hi @tugsbayasgalan, |
Summary
Remove alias_copy op.
Test plan
Add UTs to ensure alias_copy is removed.