Discard LLVM modules earlier when performing ThinLTO #56487

nikic · 2018-12-04T00:05:11Z

Currently ThinLTO is performed by first compiling all modules (and keeping them in memory), and then serializing them into ThinLTO buffers in a separate, synchronized step. Modules are later read back from ThinLTO buffers when running the ThinLTO optimization pipeline.

We can also find the following comment in lto.rs:

    // FIXME: right now, like with fat LTO, we serialize all in-memory
    //        modules before working with them and ThinLTO. We really
    //        shouldn't do this, however, and instead figure out how to
    //        extract a summary from an in-memory module and then merge that
    //        into the global index. It turns out that this loop is by far
    //        the most expensive portion of this small bit of global
    //        analysis!

I don't think that what is suggested here is the right approach: One of the primary benefits of using ThinLTO over ordinary LTO is that it's not necessary to keep all the modules (merged or not) in memory for the duration of the linking step.

However, we currently don't really make use of this (at least for crate-local ThinLTO), because we keep all modules in memory until the start of the LTO step. This PR changes the implementation to instead perform the serialization into ThinLTO buffers directly after the initial optimization step.

Most of the changes here are plumbing to separate out fat and thin lto handling in write.rs, as these now use different intermediate artifacts. For fat lto this will be in-memory modules, for thin lto it will be ThinLTO buffers.

r? @alexcrichton

alexcrichton · 2018-12-04T00:21:03Z

@bors: try

Whoa this is a great idea! This seems like the correct strategy for incremental too, although I forget if that takes different paths.

This also makes me think that we should link in fat LTO ASAP instead of synchronizing and then linking as it'd allow pipelining a bit ideally. In any case that's a patch for another time!

I'll take a closer look later, but I'm curious on the perf impact here too, it should both make builds faster (slightly) and decrease peak memory usage in theory

bors · 2018-12-04T00:21:18Z

⌛ Trying commit 94131ebd21ed6d64acdeae6d4766c5669414c488 with merge 360659bd585b2529141e8fc3228fc3bf2e1fa6d1...

bors · 2018-12-04T02:38:53Z

☀️ Test successful - status-travis
State: approved= try=True

alexcrichton · 2018-12-04T03:40:04Z

@rust-timer build 360659bd585b2529141e8fc3228fc3bf2e1fa6d1

rust-timer · 2018-12-04T03:40:05Z

Success: Queued 360659bd585b2529141e8fc3228fc3bf2e1fa6d1 with parent 0c999ed, comparison URL.

rust-timer · 2018-12-04T05:24:26Z

Finished benchmarking try commit 360659bd585b2529141e8fc3228fc3bf2e1fa6d1

nikic · 2018-12-04T10:45:31Z

The max-rss results basically looks like noise to me. The wall-time seem to be a minor win for opt builds (a few percent for clean/baseline-incremental).

I guess that means that LLVM memory usage is dominated by rustc memory usage, at least for the build types used here (opt w/o debuginfo), so it has no impact on max-rss. Unfortunately I was not able to get massif working with rustc, it always segfaults early on :(

nikic · 2018-12-04T12:06:54Z

I've got massif to run (need to directly call the jemalloc free rustc, no rustc +stage2). Here's a comparison of two runs for a crate where I see a minor max-rss reduction of about 3% for a release/debuginfo build. Note that these vary a lot between runs, so exact numbers are not meaningful:

Before:

    MB
217.0^                         #
     |                       ::#                       :::
     |                  :@@@:::#                 :::::::::
     |               @@@:@ @:::#               ::::: ::::::
     |              @@@ :@ @:::#              :::::: :::::::  ::::
     |         :::::@@@ :@ @:::#              :::::: ::::::::::::: :@: : :
     |        ::::: @@@ :@ @:::#      ::     ::::::: :::::::::::::::@:::::
     |      ::::::: @@@ :@ @:::#::    :      ::::::: :::::::::::::::@::::::
     |      ::::::: @@@ :@ @:::#: :::::      ::::::: :::::::::::::::@::::::
     |    ::::::::: @@@ :@ @:::#: ::: :     :::::::: :::::::::::::::@::::::
     |    : ::::::: @@@ :@ @:::#: ::: :    @:::::::: :::::::::::::::@::::::
     |    : ::::::: @@@ :@ @:::#: ::: :    @:::::::: :::::::::::::::@::::::@
     |   @: ::::::: @@@ :@ @:::#: ::: :   :@:::::::: :::::::::::::::@::::::@
     |   @: ::::::: @@@ :@ @:::#: ::: :  ::@:::::::: :::::::::::::::@::::::@
     |   @: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@:
     |  @@: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@::
     |  @@: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@::
     |::@@: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@::
     |: @@: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@::
     |: @@: ::::::: @@@ :@ @:::#: ::: : :::@:::::::: :::::::::::::::@::::::@::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   133.8

After:

    MB
217.5^                                                    ##
     |                                                @@::#                   
     |                   @                           :@ : # ::::              
     |           :::@@:::@:::                     ::::@ : # :::               
     |         :::: @@:: @:: :                ::::: ::@ : # ::: @  
     |       ::: :: @@:: @:: ::             ::::: : ::@ : # ::: @:::
     |       : : :: @@:: @:: ::  @         :: ::: : ::@ : # ::: @:: :@@       
     |    :::: : :: @@:: @:: ::::@         :: ::: : ::@ : # ::: @:: :@     
     |    :: : : :: @@:: @:: ::: @        ::: ::: : ::@ : # ::: @:: :@ :::::
     |    :: : : :: @@:: @:: ::: @        ::: ::: : ::@ : # ::: @:: :@ ::::   
     |    :: : : :: @@:: @:: ::: @        ::: ::: : ::@ : # ::: @:: :@ ::::   
     |    :: : : :: @@:: @:: ::: @      ::::: ::: : ::@ : # ::: @:: :@ ::::
     |    :: : : :: @@:: @:: ::: @      : ::: ::: : ::@ : # ::: @:: :@ :::: 
     |    :: : : :: @@:: @:: ::: @::    : ::: ::: : ::@ : # ::: @:: :@ :::: :
     |  :::: : : :: @@:: @:: ::: @: ::  : ::: ::: : ::@ : # ::: @:: :@ :::: : 
     |  : :: : : :: @@:: @:: ::: @: : ::: ::: ::: : ::@ : # ::: @:: :@ :::: ::
     | @: :: : : :: @@:: @:: ::: @: : : : ::: ::: : ::@ : # ::: @:: :@ :::: ::
     | @: :: : : :: @@:: @:: ::: @: : : : ::: ::: : ::@ : # ::: @:: :@ :::: ::
     | @: :: : : :: @@:: @:: ::: @: : : : ::: ::: : ::@ : # ::: @:: :@ :::: ::
     | @: :: : : :: @@:: @:: ::: @: : : : ::: ::: : ::@ : # ::: @:: :@ :::: ::
   0 +----------------------------------------------------------------------->Gi
     0                                                                   133.8

The first hump is peak optimization, the second hump is peak LTO. So in this case peak memory usage is moved from the optimization stage to the LTO stage, but it ultimately does not make much of a difference.

alexcrichton · 2018-12-04T14:36:00Z

Ah bummer! In any case this looks good to me and it looks like graphs are indeed confirming a shift in peaks, so this seems good to land to me.

r=me with a rebase!

Instead of only determining whether some form of LTO is necessary, determine whether thin, fat or no LTO is necessary. I've rewritten the conditions in a way that I think is more obvious, i.e. specified LTO type + additional preconditions.

These are going to have different intermediate artifacts, so create separate codepaths for them.

Fat LTO merges into one module, so only return one module.

nikic · 2018-12-04T15:19:47Z

@bors r=alexcrichton

bors · 2018-12-04T15:19:48Z

📌 Commit 96cc381285c8b1d83aea776282232022ed949fd7 has been approved by alexcrichton

Instead of keeping all modules in memory until thin LTO and only serializing them then, serialize the module immediately after it finishes optimizing.

nikic · 2018-12-04T15:27:50Z

@bors r=alexcrichton

Rebase mistake with submodules...

bors · 2018-12-04T15:27:51Z

📌 Commit 8128d0d has been approved by alexcrichton

bors · 2018-12-07T12:18:27Z

⌛ Testing commit 8128d0d with merge f504d3f...

@alexcrichton

Discard LLVM modules earlier when performing ThinLTO Currently ThinLTO is performed by first compiling all modules (and keeping them in memory), and then serializing them into ThinLTO buffers in a separate, synchronized step. Modules are later read back from ThinLTO buffers when running the ThinLTO optimization pipeline. We can also find the following comment in `lto.rs`: // FIXME: right now, like with fat LTO, we serialize all in-memory // modules before working with them and ThinLTO. We really // shouldn't do this, however, and instead figure out how to // extract a summary from an in-memory module and then merge that // into the global index. It turns out that this loop is by far // the most expensive portion of this small bit of global // analysis! I don't think that what is suggested here is the right approach: One of the primary benefits of using ThinLTO over ordinary LTO is that it's not necessary to keep all the modules (merged or not) in memory for the duration of the linking step. However, we currently don't really make use of this (at least for crate-local ThinLTO), because we keep all modules in memory until the start of the LTO step. This PR changes the implementation to instead perform the serialization into ThinLTO buffers directly after the initial optimization step. Most of the changes here are plumbing to separate out fat and thin lto handling in `write.rs`, as these now use different intermediate artifacts. For fat lto this will be in-memory modules, for thin lto it will be ThinLTO buffers. r? @alexcrichton

bors · 2018-12-07T14:45:08Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing f504d3f to master...

rust-highfive assigned alexcrichton Dec 4, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 4, 2018

nikic added 5 commits December 4, 2018 16:10

Extract free_worker closure

9c657e8

Refactor LTO type determination

2c1883c

Instead of only determining whether some form of LTO is necessary, determine whether thin, fat or no LTO is necessary. I've rewritten the conditions in a way that I think is more obvious, i.e. specified LTO type + additional preconditions.

Separate codepaths for fat and thin LTO in write.rs

a17de69

These are going to have different intermediate artifacts, so create separate codepaths for them.

Separate out methods for running thin and fat LTO

bdbee63

Remove unnecessary parts of run_fat_lto signature

bc2db43

Fat LTO merges into one module, so only return one module.

nikic force-pushed the discard-modules-earlier branch from 94131eb to 96cc381 Compare December 4, 2018 15:18

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 4, 2018

This comment has been minimized.

Sign in to view

Serialize modules into ThinBuffer after initial optimization

8128d0d

Instead of keeping all modules in memory until thin LTO and only serializing them then, serialize the module immediately after it finishes optimizing.

nikic force-pushed the discard-modules-earlier branch from 96cc381 to 8128d0d Compare December 4, 2018 15:25

bors merged commit 8128d0d into rust-lang:master Dec 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discard LLVM modules earlier when performing ThinLTO #56487

Discard LLVM modules earlier when performing ThinLTO #56487

nikic commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

bors commented Dec 4, 2018

bors commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

rust-timer commented Dec 4, 2018

rust-timer commented Dec 4, 2018

nikic commented Dec 4, 2018

nikic commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

nikic commented Dec 4, 2018

bors commented Dec 4, 2018

This comment has been minimized.

nikic commented Dec 4, 2018

bors commented Dec 4, 2018

bors commented Dec 7, 2018

bors commented Dec 7, 2018

Discard LLVM modules earlier when performing ThinLTO #56487

Discard LLVM modules earlier when performing ThinLTO #56487

Conversation

nikic commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

bors commented Dec 4, 2018

bors commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

rust-timer commented Dec 4, 2018

rust-timer commented Dec 4, 2018

nikic commented Dec 4, 2018

nikic commented Dec 4, 2018

alexcrichton commented Dec 4, 2018

nikic commented Dec 4, 2018

bors commented Dec 4, 2018

This comment has been minimized.

nikic commented Dec 4, 2018

bors commented Dec 4, 2018

bors commented Dec 7, 2018

bors commented Dec 7, 2018