Skip to content

Make the arena fast path smaller #56404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Dec 1, 2018

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 1, 2018
@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 1, 2018

@bors try

@bors
Copy link
Collaborator

bors commented Dec 1, 2018

⌛ Trying commit 45da76e30802c4e272c6e82293d41142a2a8047f with merge b40fd9ad1c114049c0ef02dd1a36e93788ec0482...

@rust-highfive

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Dec 1, 2018

☀️ Test successful - status-travis
State: approved= try=True

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 1, 2018

@rust-timer build b40fd9ad1c114049c0ef02dd1a36e93788ec0482

@rust-timer
Copy link
Collaborator

Success: Queued b40fd9ad1c114049c0ef02dd1a36e93788ec0482 with parent d3ed348, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit b40fd9ad1c114049c0ef02dd1a36e93788ec0482

@rust-highfive

This comment has been minimized.

@bors

This comment has been minimized.

@michaelwoerister
Copy link
Member

The perf-results mostly look like noise (except for ctfe-stress). I'm not sure this is worth the more complicated implementation.

cc @rust-lang/wg-compiler-performance

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 6, 2018

The results doesn't look like noise though. The whole point of the arena is performance. We should just use malloc otherwise =P

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 6, 2018

This is also a nice size optimization. It brings the fast path for allocating an usize from:

.text:0000000180001235                 mov     rsi, rcx
.text:0000000180001238                 mov     rax, [rcx]
.text:000000018000123B                 mov     rcx, [rcx+8]
.text:000000018000123F                 add     rax, 7
.text:0000000180001243                 and     rax, 0FFFFFFFFFFFFFFF8h
.text:0000000180001247                 mov     [rsi], rax
.text:000000018000124A                 cmp     rcx, rax
.text:000000018000124D                 jb      short loc_180001281
.text:000000018000124F                 mov     rdx, rax
.text:0000000180001252                 add     rdx, 8
.text:0000000180001256                 cmp     rdx, rcx
.text:0000000180001259                 jnb     short loc_18000126B
.text:000000018000125B
.text:000000018000125B loc_18000125B:                          ; CODE XREF: alloctest+4F↓j
.text:000000018000125B                 mov     [rsi], rdx
...
.text:000000018000126B ; ---------------------------------------------------------------------------
.text:000000018000126B
.text:000000018000126B loc_18000126B:                          ; CODE XREF: alloctest+29↑j
.text:000000018000126B                 mov     edx, 8
.text:0000000180001270                 mov     rcx, rsi
.text:0000000180001273                 call    _ZN5arena13DroplessArena4grow17h2cc6dfe0f3889a5eE ; arena::DroplessArena::grow::h2cc6dfe0f3889a5e
.text:0000000180001278                 mov     rax, [rsi]
.text:000000018000127B                 lea     rdx, [rax+8]
.text:000000018000127F                 jmp     short loc_18000125B
.text:0000000180001281 ; ---------------------------------------------------------------------------
.text:0000000180001281
.text:0000000180001281 loc_180001281:                          ; CODE XREF: alloctest+1D↑j
.text:0000000180001281                 lea     rcx, aAssertionFaile ; "assertion failed: self.ptr <= self.end"
.text:0000000180001288                 lea     r8, off_180003230
.text:000000018000128F                 mov     edx, 26h
.text:0000000180001294                 call    _ZN3std9panicking11begin_panic17hca0ac606c2739ec6E ; std::panicking::begin_panic::hca0ac606c2739ec6
.text:0000000180001299 ; ---------------------------------------------------------------------------
.text:0000000180001299                 ud2
.text:0000000180001299 alloctest       endp

to:

.text:00000001800012C4                 mov     rax, [rcx]
.text:00000001800012C7                 lea     rdx, [rax+8]
.text:00000001800012CB                 cmp     rdx, [rcx+8]
.text:00000001800012CF                 jnb     short loc_1800012E0
.text:00000001800012D1                 mov     [rcx], rdx
.text:00000001800012D4
.text:00000001800012D4 loc_1800012D4:                          ; CODE XREF: alloctest+2A↓j
...
.text:00000001800012E0 ; ---------------------------------------------------------------------------
.text:00000001800012E0
.text:00000001800012E0 loc_1800012E0:                          ; CODE XREF: alloctest+F↑j
.text:00000001800012E0                 mov     edx, 8
.text:00000001800012E5                 call    _ZN5arena13DroplessArena18grow_and_alloc_raw17ha43c166c6d43e92fE ; arena::DroplessArena::grow_and_alloc_raw::ha43c166c6d43e92f
.text:00000001800012EA                 jmp     short loc_1800012D4

@michaelwoerister
Copy link
Member

I can see the first change making sense but why change from u8 to usize for the drop-less arena?

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 7, 2018

@michaelwoerister It avoids the instructions realigning the bump pointer by keeping it always usize aligned.

@Zoxc
Copy link
Contributor Author

Zoxc commented Dec 7, 2018

Might make the DroplessArena generic over the backing type, so we could use u8 in the string interner. I think all other allocations are usize aligned.

@michaelwoerister
Copy link
Member

I'm still not convinced that there are performance gains here. Compare this to a perf.rlo run from a few days ago where I accidentally benchmarked to equivalent versions of the compiler (so we know what we see is noise):
https://perf.rust-lang.org/compare.html?start=10e2c729ea2e85cb1e2a08be40564492f49e45ec&end=5280d23d48ea46ca2469a77cc240fbef6767efbb

Except for ctfe-stress it looks the same to me.

@nnethercote
Copy link
Contributor

I agree that the result are probably just noise, with the possible exception of ctfe-stress, but it is one of the most variable benchmarks, which is why it has the '?' markers on its results. So even that could be noise.

@michaelwoerister
Copy link
Member

If you remove the BackingType change, I'm happy to approve the PR. I think that always using u8 as the backing type makes the code a lot easier to reason about.

@Zoxc Zoxc closed this Jan 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants