You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (#96609)
When split functions is used, BOLT may skip tentative code layout
estimation in some cases, like:
- when there is no profile data for some blocks (ie cold blocks)
- when there are cold functions in lite mode
- when skip functions is used
However, when rewriting the binary we still need to compute PC-relative
distances between hot and cold basic blocks. Without cold layout
estimation, BOLT uses '0x0' as the address of the first cold block,
leading to incorrect estimations of any PC-relative addresses.
This affects large binaries as the relaxStub method expands more
branches than necessary using the short-jump sequence, at it wrongly
believes it has exceeded the branch distance boundary.
This increases code size with both a larger and slower sequence;
however,
performance regression is expected to be minimal since this only affects
any called cold code.
Example of such an unnecessary relaxation:
from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
0 commit comments