Skip to content

LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. #27849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 30, 2019

Conversation

eeckstein
Copy link
Contributor

@eeckstein eeckstein commented Oct 23, 2019

This is a combination of load hoisting and store sinking, e.g.

  preheader:
    br header_block
  header_block:
    %x = load %not_aliased_addr
    // use %x and define %y
    store %y to %not_aliased_addr
    ...
  exit_block:

is transformed to:

  preheader:
    %x = load %not_aliased_addr
    br header_block
  header_block:
    // use %x and define %y
    ...
  exit_block:
    store %y to %not_aliased_addr

This optimization is important to optimize inout arguments, especially with COW support in SIL: it relies on values being in SSA values rather than in memory.

This PR is for test and review for now. If everything goes well I'll merge it next week.

@eeckstein eeckstein requested a review from atrick October 23, 2019 15:27
@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein eeckstein changed the title LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. [DNM] LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. Oct 23, 2019
@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 9ff63b01eb90c6e92b34d4fe15bc755ef9afee5b

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 9ff63b01eb90c6e92b34d4fe15bc755ef9afee5b

Because the set includes all side-effect instructions, also may-reads.
NFC
@eeckstein eeckstein force-pushed the licm-load-store-hoisting branch from 9ff63b0 to 10d96a8 Compare October 29, 2019 09:21
@eeckstein
Copy link
Contributor Author

@swift-ci test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test

@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
PrefixAnyCollection 58 76 +31.0% 0.76x
DropFirstAnyCollection 59 76 +28.8% 0.78x
SuffixAnyCollection 22 28 +27.3% 0.79x
DropLastAnyCollection 22 28 +27.3% 0.79x
DropWhileAnyCollection 76 94 +23.7% 0.81x
PrefixWhileAnyCollection 111 129 +16.2% 0.86x
 
Improvement OLD NEW DELTA RATIO
ExclusivityGlobal 8 5 -37.5% 1.60x
NSStringConversion.LongUTF8 589 542 -8.0% 1.09x (?)
NSStringConversion.UTF8 835 777 -6.9% 1.07x (?)

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
ExclusivityGlobal 8 5 -37.5% 1.60x
ArrayAppendOptionals 2020 1300 -35.6% 1.55x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
RandomShuffle.o 3533 3581 +1.4% 0.99x

Performance: -Onone

Regression OLD NEW DELTA RATIO
ClassArrayGetter2 3940 4260 +8.1% 0.92x

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 10d96a86768c8cc4f1610c42ea91c7d2e2c5ece3

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 10d96a86768c8cc4f1610c42ea91c7d2e2c5ece3

…he same memory location out of a loop.

This is a combination of load hoisting and store sinking, e.g.

  preheader:
    br header_block
  header_block:
    %x = load %not_aliased_addr
    // use %x and define %y
    store %y to %not_aliased_addr
    ...
  exit_block:

is transformed to:

  preheader:
    %x = load %not_aliased_addr
    br header_block
  header_block:
    // use %x and define %y
    ...
  exit_block:
    store %y to %not_aliased_addr
@eeckstein eeckstein force-pushed the licm-load-store-hoisting branch from 10d96a8 to 584581e Compare October 29, 2019 15:50
@eeckstein
Copy link
Contributor Author

@swift-ci test

1 similar comment
@eeckstein
Copy link
Contributor Author

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 584581e

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 584581e

@eeckstein eeckstein changed the title [DNM] LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. LICM: add an optimization to move multiple loads and stores from/to the same memory location out of a loop. Oct 30, 2019
@eeckstein
Copy link
Contributor Author

@swift-ci test

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good. I have a few comments below though.

And I just realized this PR isn't the code that I reviewed, so I'll add another comment here
#27990

llvm_unreachable("unknown projection");
}

/// Returns true if all stores to \p addr commonly dominate the loop exitst of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo exitst

}
}

// In case the value is only stored but never loaded in the loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: Or it might have been reloaded only on paths that reached stores first..


// This is not a requirement for functional correctness, but we don't want to
// _speculatively_ load and store the value (outside of the loop).
if (!storesCommonlyDominateLoopExits(addr, loop, exitingBlocks))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're worried about emitting an extra store, then just suppress it if the store is the loaded value. I wouldn't be worried about the "extra" load because either the load will be unnecessary and removed, or it is really needed on some paths through the loop, in which case it's totally reasonble to speculatively hoist it. It should be a good performance tradeoff most of the time.

// Check if the store-is-not-alive flag reaches any of the exits.
for (SILBasicBlock *eb : exitingBlocks) {
// Ignore loop exits to blocks which end in an unreachable.
if (!std::any_of(eb->succ_begin(), eb->succ_end(), isUnreachableBlock) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand what unreachable blocks have to do with this at first. I guess it's a performance heuristic because you don't care about emitting an extra load on a slow path (an extra store should just go away anyway)? If that's the case, it needs to be explained, otherwise it's misleading because it leads the reader to believe that the unreachable path can really be ignored and that the store isn't needed there (which is very wrong).

When unreachable blocks actually matter, the DeadEndBlocks analysis should be used because it's more robust. Or when cold blocks matter there's a ColdBlocks analysis.

(I don't actually think you need to do any of this dominance checking though)

// Remove all stores and replace the loads with the current value.
SILBasicBlock *currentBlock = nullptr;
SILValue currentVal;
for (SILInstruction *I : LoadsAndStores) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has an assumption that LoadsAndStores are in program order but there's no contract. It isn't specified at the declaration or when the LoadsAndStores list is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants