-
Notifications
You must be signed in to change notification settings - Fork 551
Serialize NamedDataStoreOutput into PTD. #9125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9125
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8fcbe70 with merge base 976fe48 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/) ghstack-source-id: 270931489 Pull Request resolved: #9125
This pull request was exported from Phabricator. Differential Revision: D70939807 |
@@ -74,39 +75,54 @@ def serialize_for_executorch( | |||
tensor.extra_tensor_info.fully_qualified_name | |||
] = TensorLayout(tensor.scalar_type, tensor.sizes, tensor.dim_order) | |||
|
|||
if len(fqn_to_tensor_layout) == 0 and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible for only one of these to be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
fqn_to_tensor_layout is for tensors being saved to a separate file,
named_data.external_data is for named blobs being saved to a separate file
For training, named_data.external_data is empty.
For delegate weight sharing, fqn_to_tensor_layout is empty.
return pte, ptd_files | ||
|
||
all_external_files: Set[str] = set() | ||
if named_data is not None and len(named_data.external_data) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question here, can named_data be not none but the length of the data is > 0 or vice versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
named_data stores blobs that are either saved in the PTE file or saved externally.
If all named data is marked to be saved in the PTE file, named_data.pte_data is non-empty, and named_data.external_data is empty
|
||
def __init__(self, data: Cord, alignment: Optional[int] = None) -> None: | ||
self.data = data | ||
self.alignment = alignment or 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL you can do this
offset=aligned_size(prev_end, self.config.segment_alignment), | ||
size=len(segment), | ||
offset=aligned_size(prev_end, alignment), | ||
size=len(segment.data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what else is in the segment now besides data? Alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same as the PTE file; segment only contains offset+size. Size does not include alignment. We just have to make sure the data is aligned to the LCM of segment_alignment and any delegate-requested alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i meant you have segment.data. What else is in that object besides the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's any additional alignment from the delegate. It's passed from
NamedDataStore (delegates add blobs+optional alignment to the store) -->
DataEntry (buffer index + alignment) -->
here
Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/) [ghstack-poisoned]
Pull Request resolved: #9125 Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). ghstack-source-id: 273803285 @exported-using-ghexport Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/)
This pull request was exported from Phabricator. Differential Revision: D70939807 |
Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/) [ghstack-poisoned]
Pull Request resolved: #9125 Update PTD serialization to account for blobs from the NamedDataStoreOutput. Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore). ghstack-source-id: 274509070 @exported-using-ghexport Differential Revision: [D70939807](https://our.internmc.facebook.com/intern/diff/D70939807/)
This pull request was exported from Phabricator. Differential Revision: D70939807 |
ed8f793
into
gh/lucylq/59/base
Stack from ghstack (oldest at bottom):
Update PTD serialization to account for blobs from the NamedDataStoreOutput.
Something we can do in the future is to consolidate tensors (that go through the emitter) and blobs (that come from the NamedDataStore).
Differential Revision: D70939807