-
Notifications
You must be signed in to change notification settings - Fork 10.5k
LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. #27849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. #27849
Conversation
@swift-ci benchmark |
@swift-ci test |
Build failed |
Build failed |
Because the set includes all side-effect instructions, also may-reads. NFC
9ff63b0
to
10d96a8
Compare
@swift-ci test |
1 similar comment
@swift-ci test |
@swift-ci benchmark |
1 similar comment
@swift-ci benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
Build failed |
…he same memory location out of a loop. This is a combination of load hoisting and store sinking, e.g. preheader: br header_block header_block: %x = load %not_aliased_addr // use %x and define %y store %y to %not_aliased_addr ... exit_block: is transformed to: preheader: %x = load %not_aliased_addr br header_block header_block: // use %x and define %y ... exit_block: store %y to %not_aliased_addr
10d96a8
to
584581e
Compare
@swift-ci test |
1 similar comment
@swift-ci test |
Build failed |
Build failed |
@swift-ci test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good. I have a few comments below though.
And I just realized this PR isn't the code that I reviewed, so I'll add another comment here
#27990
llvm_unreachable("unknown projection"); | ||
} | ||
|
||
/// Returns true if all stores to \p addr commonly dominate the loop exitst of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo exitst
} | ||
} | ||
|
||
// In case the value is only stored but never loaded in the loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: Or it might have been reloaded only on paths that reached stores first..
|
||
// This is not a requirement for functional correctness, but we don't want to | ||
// _speculatively_ load and store the value (outside of the loop). | ||
if (!storesCommonlyDominateLoopExits(addr, loop, exitingBlocks)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're worried about emitting an extra store, then just suppress it if the store is the loaded value. I wouldn't be worried about the "extra" load because either the load will be unnecessary and removed, or it is really needed on some paths through the loop, in which case it's totally reasonble to speculatively hoist it. It should be a good performance tradeoff most of the time.
// Check if the store-is-not-alive flag reaches any of the exits. | ||
for (SILBasicBlock *eb : exitingBlocks) { | ||
// Ignore loop exits to blocks which end in an unreachable. | ||
if (!std::any_of(eb->succ_begin(), eb->succ_end(), isUnreachableBlock) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand what unreachable blocks have to do with this at first. I guess it's a performance heuristic because you don't care about emitting an extra load on a slow path (an extra store should just go away anyway)? If that's the case, it needs to be explained, otherwise it's misleading because it leads the reader to believe that the unreachable path can really be ignored and that the store isn't needed there (which is very wrong).
When unreachable blocks actually matter, the DeadEndBlocks analysis should be used because it's more robust. Or when cold blocks matter there's a ColdBlocks analysis.
(I don't actually think you need to do any of this dominance checking though)
// Remove all stores and replace the loads with the current value. | ||
SILBasicBlock *currentBlock = nullptr; | ||
SILValue currentVal; | ||
for (SILInstruction *I : LoadsAndStores) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has an assumption that LoadsAndStores are in program order but there's no contract. It isn't specified at the declaration or when the LoadsAndStores list is created.
This is a combination of load hoisting and store sinking, e.g.
is transformed to:
This optimization is important to optimize inout arguments, especially with COW support in SIL: it relies on values being in SSA values rather than in memory.
This PR is for test and review for now. If everything goes well I'll merge it next week.