-
Notifications
You must be signed in to change notification settings - Fork 534
doc: update pytorch-on-xla-devices and troubleshoot doc for tensor synchronization issue #9258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 5 commits
3358752
24cc94e
f6b1dce
a5a375b
4989631
cda7c36
e2b0d86
cca09ca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,19 +137,25 @@ Execution Analysis: ------------------------------------------------------------ | |
Execution Analysis: ================================================================================ | ||
``` | ||
|
||
Some common causes of Compilation/Executation are 1. User manually call | ||
`torch_xla.sync()`. 2. [Parallel | ||
Some common causes of compilation/executation are | ||
1. User manually calls | ||
`torch_xla.sync()`. | ||
2. [Parallel | ||
loader](https://github.com/pytorch/xla/blob/fe4af0080af07f78ca2b614dd91b71885a3bbbb8/torch_xla/distributed/parallel_loader.py#L49-L51) | ||
call `torch_xla.sync()` for every x (configurable) batch. 3. Exiting a | ||
call `torch_xla.sync()` for every x (configurable) batch. | ||
3. Exiting a | ||
[profiler StepTrace | ||
region](https://github.com/pytorch/xla/blob/fe4af0080af07f78ca2b614dd91b71885a3bbbb8/torch_xla/debug/profiler.py#L165-L171). | ||
4. Dynamo decide to compile/execute the graph. 5. User trying to | ||
4. Dynamo decides to compile/execute the graph. | ||
5. User tries to | ||
access(often due to logging) the value of a tensor before the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Space needed after "access" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated. Thanks. |
||
`torch_xla.sync()`. | ||
6. User tries to a tensor value before calling `mark_step`. See [PyTorch on XLA Devices](https://github.com/pytorch/xla/blob/master/docs/source/learn/pytorch-on-xla-devices.md) for more details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. User tries to access a tensor value ...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated. Thanks. |
||
|
||
The op executions caused by items 1-4 are expected, and we want to avoid item 5 by | ||
either reducing the frequency of accessing tensor values or manually adding a call to | ||
`torch_xla.sync()` before accessing them. | ||
|
||
The execution caused by 1-4 are expected, and we want to avoid 5 by | ||
either reduce the frequency of accessing tensor values or manually add a | ||
`torch_xla.sync()` before accessing. | ||
|
||
Users should expect to see this `Compilation Cause` + | ||
`Executation Cause` pairs for first couple steps. After the model | ||
|
Uh oh!
There was an error while loading. Please reload this page.