You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aggressive inlining might produce huge functions with >10K of basic
blocks. Since BFI treats _all_ blocks and jumps as "hot" having
non-negative (but perhaps small) weight, the current implementation can
be slow, taking minutes to produce an layout. This change introduces a
few modifications that significantly (up to 50x on some instances)
speeds up the computation. Some notable changes:
- reduced the maximum chain size to 512 (from the prior 4096);
- introduced MaxMergeDensityRatio param to avoid merging chains with
very different densities;
- dropped a couple of params that seem unnecessary.
Looking at some "offline" metrics (e.g., the number of created
fall-throughs), there shouldn't be problems; in fact, I do see some
metrics go up. But it might be hard/impossible to measure perf
difference for such small changes. I did test the performance clang-14
binary and do not record a perf or i-cache-related differences.
My 5 benchmarks, with ext-tsp runtime (the lower the better) and
"tsp-score" (the higher the better).
**Before**:
- benchmark 1:
num functions: 13,047
reordering running time is 2.4 seconds
score: 125503458 (128.3102%)
- benchmark 2:
num functions: 16,438
reordering running time is 3.4 seconds
score: 12613997277 (129.7495%)
- benchmark 3:
num functions: 12,359
reordering running time is 1.9 seconds
score: 1315881613 (105.8991%)
- benchmark 4:
num functions: 96,588
reordering running time is 7.3 seconds
score: 89513906284 (100.3413%)
- benchmark 5:
num functions: 1
reordering running time is 372 seconds
score: 21292505965077 (99.9979%)
- benchmark 6:
num functions: 71,155
reordering running time is 314 seconds
score: 29795381626270671437824 (102.7519%)
**After**:
- benchmark 1:
reordering running time is 2.2 seconds
score: 125510418 (128.3130%)
- benchmark 2:
reordering running time is 2.6 seconds
score: 12614502162 (129.7525%)
- benchmark 3:
reordering running time is 1.6 seconds
score: 1315938168 (105.9024%)
- benchmark 4:
reordering running time is 4.9 seconds
score: 89518095837 (100.3454%)
- benchmark 5:
reordering running time is 4.8 seconds
score: 21292295939119 (99.9971%)
- benchmark 6:
reordering running time is 104 seconds
score: 29796710925310302879744 (102.7565%)
0 commit comments