|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Intro to rustc's self profiler" |
| 4 | +author: Wesley Wiser |
| 5 | +description: "Learn how to use the -Zself-profile rustc flag" |
| 6 | +team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/> |
| 7 | +--- |
| 8 | + |
| 9 | +Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`. |
| 10 | +This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance. |
| 11 | +In this post, we'll look at the tools currently available and use them to profile `rustc` while it compiles an example crate. |
| 12 | + |
| 13 | +## Setup |
| 14 | + |
| 15 | +First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data. |
| 16 | + |
| 17 | +```sh |
| 18 | +$ git clone https://github.com/rust-lang/measureme.git |
| 19 | +$ cd measureme |
| 20 | +$ cargo build --release --all |
| 21 | +``` |
| 22 | + |
| 23 | +Now that we have our tools, let's download an example crate to profile its build. |
| 24 | + |
| 25 | +```sh |
| 26 | +$ cd .. |
| 27 | +$ git clone https://github.com/rust-lang/regex.git |
| 28 | +$ cd regex |
| 29 | +``` |
| 30 | + |
| 31 | +We'll need to use a recent nightly compiler to get access to unstable `-Z` flags. |
| 32 | + |
| 33 | +```sh |
| 34 | +$ rustup override set nightly |
| 35 | +``` |
| 36 | + |
| 37 | +If you haven't installed a nightly compiler before, this will download the nightly compiler for you. |
| 38 | +If you have, then update it to make sure you're on a recent version. |
| 39 | + |
| 40 | +```sh |
| 41 | +$ rustup update nightly |
| 42 | +``` |
| 43 | + |
| 44 | +## Profiling the compiler |
| 45 | + |
| 46 | +Now we can build it and tell `rustc` to profile the build of the `regex` crate. |
| 47 | +This will cause three new files to be created in the working directory which contain the profling data. |
| 48 | + |
| 49 | +```sh |
| 50 | +$ cargo rustc -- -Zself-profile |
| 51 | +$ ls |
| 52 | +CHANGELOG.md LICENSE-APACHE UNICODE.md regex-17088.string_data regex-syntax target |
| 53 | +Cargo.lock LICENSE-MIT bench regex-17088.string_index rustfmt.toml test |
| 54 | +Cargo.toml PERFORMANCE.md examples regex-capi scripts tests |
| 55 | +HACKING.md README.md regex-17088.events regex-debug src |
| 56 | +``` |
| 57 | + |
| 58 | +The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`. |
| 59 | + |
| 60 | +We'll use each of the three main tools to analyze the profiling data: |
| 61 | + |
| 62 | +### `summarize` |
| 63 | + |
| 64 | +As its name suggests, this tool summarizes the data found in the trace files. |
| 65 | +Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode. |
| 66 | + |
| 67 | +Let's run the tool, passing just the file name (but not the extension) for the trace: |
| 68 | + |
| 69 | +```sh |
| 70 | +$ ../measureme/target/release/summarize summarize regex-17088 |
| 71 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 72 | +| Item | Self time | % of total time | Time | Item count | |
| 73 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 74 | +| LLVM_module_codegen_emit_obj | 4.89s | 42.752 | 4.89s | 159 | |
| 75 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 76 | +| codegen_module | 1.25s | 10.967 | 1.37s | 159 | |
| 77 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 78 | +| LLVM_module_optimize_module_passes | 1.15s | 10.022 | 1.15s | 159 | |
| 79 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 80 | +| LLVM_module_codegen_make_bitcode | 786.56ms | 6.875 | 960.73ms | 159 | |
| 81 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 82 | +| typeck_tables_of | 565.18ms | 4.940 | 689.39ms | 848 | |
| 83 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 84 | +| LLVM_module_codegen | 408.01ms | 3.566 | 6.26s | 159 | |
| 85 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 86 | +| mir_borrowck | 224.03ms | 1.958 | 543.33ms | 848 | |
| 87 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 88 | +| LLVM_module_codegen_emit_compressed_bitcode | 174.17ms | 1.522 | 174.17ms | 159 | |
| 89 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 90 | +| optimized_mir | 157.91ms | 1.380 | 205.29ms | 1996 | |
| 91 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 92 | +| evaluate_obligation | 146.50ms | 1.281 | 184.17ms | 8304 | |
| 93 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 94 | +| codegen_crate | 139.48ms | 1.219 | 1.58s | 1 | |
| 95 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 96 | +| mir_built | 123.88ms | 1.083 | 168.01ms | 848 | |
| 97 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 98 | +| metadata_decode_entry | 88.36ms | 0.772 | 117.77ms | 55642 | |
| 99 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 100 | +| incr_comp_copy_cgu_workproducts | 64.21ms | 0.561 | 64.21ms | 1 | |
| 101 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 102 | +| monomorphization_collector_graph_walk | 54.11ms | 0.473 | 344.00ms | 1 | |
| 103 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 104 | +| link_rlib | 43.21ms | 0.378 | 43.21ms | 1 | |
| 105 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 106 | +| check_impl_item_well_formed | 41.36ms | 0.362 | 77.14ms | 736 | |
| 107 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 108 | +| codegen_fulfill_obligation | 40.36ms | 0.353 | 51.56ms | 1759 | |
| 109 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 110 | +| expand_crate | 37.24ms | 0.326 | 48.52ms | 1 | |
| 111 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 112 | +| symbol_name | 36.31ms | 0.317 | 39.06ms | 5513 | |
| 113 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 114 | +| free_global_ctxt | 34.34ms | 0.300 | 34.34ms | 1 | |
| 115 | ++-----------------------------------------------+-----------+-----------------+----------+------------+ |
| 116 | +... |
| 117 | +Total cpu time: 11.440758871s |
| 118 | +``` |
| 119 | + |
| 120 | +The output is sorted by the self time (time spent in the query or activity but not other queries or activities called by itself). |
| 121 | +As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable. |
| 122 | + |
| 123 | +### `flamegraph` |
| 124 | + |
| 125 | +As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data. |
| 126 | +To run the tool, we'll pass just the filename without a file extension like we did for `summarize`: |
| 127 | + |
| 128 | +```sh |
| 129 | +$ ../measureme/target/release/flamegraph regex-17088 |
| 130 | +``` |
| 131 | + |
| 132 | +This will create a file called `rustc.svg` in the working directory: |
| 133 | + |
| 134 | +[![Image of flamegraph output][flame graph img]][flame graph img] |
| 135 | + |
| 136 | +[Click here] to try the interactive svg. |
| 137 | + |
| 138 | +### `crox` |
| 139 | + |
| 140 | +This tool processes self-profiling data into the JSON format that the Chromium profiler understands. |
| 141 | +You can use it to create a graphical timeline showing exactly when various traced events occurred. |
| 142 | + |
| 143 | +In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events. |
| 144 | +Let's get started with the basics! |
| 145 | + |
| 146 | +#### Basic usage |
| 147 | + |
| 148 | +To run the tool, we'll just pass the filename without a file extension like we've done before: |
| 149 | + |
| 150 | +```sh |
| 151 | +$ ../measureme/target/release/crox regex-17088 |
| 152 | +``` |
| 153 | + |
| 154 | +This creates a file called `chrome_profiler.json` in the working directory. |
| 155 | +To open it, we'll use the regular Chromium performance tools you might already be familiar with: |
| 156 | + |
| 157 | +1. Open Chrome |
| 158 | +2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS) |
| 159 | +3. Click the Performance tab at the top of the console. |
| 160 | +4. Click the "Load profile" button which looks like an arrow pointing up. |
| 161 | +5. Select the `chrome_profiler.json` file we created. |
| 162 | + |
| 163 | +You should now see something similar to this: |
| 164 | + |
| 165 | +[![Image of chrome profiler][chrome profiler img1]][chrome profiler img1] |
| 166 | + |
| 167 | +You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline. |
| 168 | + |
| 169 | +#### Filtering short events |
| 170 | + |
| 171 | +If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file. |
| 172 | +One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration: |
| 173 | + |
| 174 | +```sh |
| 175 | +$ ../measureme/target/release/crox --minumum-duration 2 regex-17088 |
| 176 | +``` |
| 177 | + |
| 178 | +Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb. |
| 179 | + |
| 180 | +#### Capturing event arguments |
| 181 | + |
| 182 | +The self-profiler can be configured to record event arguments during compilation. |
| 183 | +For example, queries will include their query key. |
| 184 | +This functionality is turned off by default because it increases the self-profiler overhead. |
| 185 | + |
| 186 | +To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`: |
| 187 | + |
| 188 | +```sh |
| 189 | +$ cargo clean |
| 190 | +$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args |
| 191 | +``` |
| 192 | + |
| 193 | +And then process the new output files: |
| 194 | + |
| 195 | +```sh |
| 196 | +$ ../measureme/target/release/crox regex-23649 |
| 197 | +``` |
| 198 | + |
| 199 | +Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen: |
| 200 | + |
| 201 | +[![Image of Chrome profiler details][chrome profiler img2]][chrome profiler img2] |
| 202 | + |
| 203 | +Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body. |
| 204 | + |
| 205 | +#### Profiling an entire crate graph |
| 206 | + |
| 207 | +By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's. |
| 208 | +`crox` can then combine all of the profiles together into one output file. |
| 209 | +Since this will create a lot of files, we'll first create a folder to put all the traces in. |
| 210 | + |
| 211 | +```sh |
| 212 | +$ rm regex-17088.* regex-23649.* # clean up the old trace files since we're done with them |
| 213 | +$ mkdir profiles |
| 214 | +$ cargo clean |
| 215 | +$ RUSTFLAGS="-Zself-profile=./profiles -Zself-profile-events=default,args" cargo build |
| 216 | +``` |
| 217 | + |
| 218 | +This creates quite a few trace files in the working directory. |
| 219 | +Now, we'll tell `crox` to combine all of the trace files in the current directory together: |
| 220 | + |
| 221 | +```sh |
| 222 | +$ ../measureme/target/release/crox --dir . |
| 223 | +``` |
| 224 | + |
| 225 | +Opening this file shows all of the crates compiled: |
| 226 | + |
| 227 | +[![Image of Chrome profiler with all crates][chrome profiler img3]][chrome profiler img3] |
| 228 | + |
| 229 | +Clicking on a crate will expand it to show the threads and event data inside it: |
| 230 | + |
| 231 | +[![Image of Chrome profiler with a crate expanded][chrome profiler img4]][chrome profiler img4] |
| 232 | + |
| 233 | +Thanks for reading! |
| 234 | + |
| 235 | +We've been using these tools extensively ourselves over the last few months and they've helped us tremendously in understanding where the compiler spends its time. |
| 236 | +In the future we'll be adding more features and we'll work on making the tooling easier to use. |
| 237 | +If you have questions or would like to get involved with the Self-Profile Working Group, please check out the [measureme repository] or stop by our [Zulip stream]. |
| 238 | + |
| 239 | +[chrome profiler img1]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler1.png |
| 240 | +[chrome profiler img2]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler2.png |
| 241 | +[chrome profiler img3]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler3.png |
| 242 | +[chrome profiler img4]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler4.png |
| 243 | +[Click here]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/rustc.svg |
| 244 | +[flame graph]: http://www.brendangregg.com/flamegraphs.html |
| 245 | +[flame graph img]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/flamegraph_image.png |
| 246 | +[measureme repository]: https://github.com/rust-lang/measureme |
| 247 | +[Self-Profile Working Group]: https://rust-lang.github.io/compiler-team/working-groups/self-profile/ |
| 248 | +[Zulip stream]: https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile |
0 commit comments