Skip to content

Commit 8963969

Browse files
committed
[Inside Rust] Intro to rustc's self-profile feature
1 parent 01d4c93 commit 8963969

File tree

7 files changed

+667
-0
lines changed

7 files changed

+667
-0
lines changed
Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
---
2+
layout: post
3+
title: "Intro to rustc's self profiler"
4+
author: Wesley Wiser
5+
description: "Learn how to use the -Zself-profile rustc flag"
6+
team: the self-profile working group <https://rust-lang.github.io/compiler-team/working-groups/self-profile/>
7+
---
8+
9+
Over the last year, the [Self-Profile Working Group] has been building tools to profile `rustc`.
10+
This is part of the Compiler Team's ongoing efforts to improve `rustc`'s performance.
11+
In this post, we'll look at the tools currently available and use them to profile `rustc` while it compiles an example crate.
12+
13+
## Setup
14+
15+
First, we'll download and build the `measureme` repository which provides tools to analyze self-profile trace data.
16+
17+
```sh
18+
$ git clone https://github.com/rust-lang/measureme.git
19+
$ cd measureme
20+
$ cargo build --release --all
21+
```
22+
23+
Now that we have our tools, let's download an example crate to profile its build.
24+
25+
```sh
26+
$ cd ..
27+
$ git clone https://github.com/rust-lang/regex.git
28+
$ cd regex
29+
```
30+
31+
We'll need to use a recent nightly compiler to get access to unstable `-Z` flags.
32+
33+
```sh
34+
$ rustup override set nightly
35+
```
36+
37+
If you haven't installed a nightly compiler before, this will download the nightly compiler for you.
38+
If you have, then update it to make sure you're on a recent version.
39+
40+
```sh
41+
$ rustup update nightly
42+
```
43+
44+
## Profiling the compiler
45+
46+
Now we can build it and tell `rustc` to profile the build of the `regex` crate.
47+
This will cause three new files to be created in the working directory which contain the profling data.
48+
49+
```sh
50+
$ cargo rustc -- -Zself-profile
51+
$ ls
52+
CHANGELOG.md LICENSE-APACHE UNICODE.md regex-17088.string_data regex-syntax target
53+
Cargo.lock LICENSE-MIT bench regex-17088.string_index rustfmt.toml test
54+
Cargo.toml PERFORMANCE.md examples regex-capi scripts tests
55+
HACKING.md README.md regex-17088.events regex-debug src
56+
```
57+
58+
The new files follow the format `{crate name}-{rustc process id}.{events,string_data,string_index}`.
59+
60+
We'll use each of the three main tools to analyze the profiling data:
61+
62+
### `summarize`
63+
64+
As its name suggests, this tool summarizes the data found in the trace files.
65+
Additionally, `summarize` can also show a "diff" between two trace files but we won't be using this mode.
66+
67+
Let's run the tool, passing just the file name (but not the extension) for the trace:
68+
69+
```sh
70+
$ ../measureme/target/release/summarize summarize regex-17088
71+
+-----------------------------------------------+-----------+-----------------+----------+------------+
72+
| Item | Self time | % of total time | Time | Item count |
73+
+-----------------------------------------------+-----------+-----------------+----------+------------+
74+
| LLVM_module_codegen_emit_obj | 4.89s | 42.752 | 4.89s | 159 |
75+
+-----------------------------------------------+-----------+-----------------+----------+------------+
76+
| codegen_module | 1.25s | 10.967 | 1.37s | 159 |
77+
+-----------------------------------------------+-----------+-----------------+----------+------------+
78+
| LLVM_module_optimize_module_passes | 1.15s | 10.022 | 1.15s | 159 |
79+
+-----------------------------------------------+-----------+-----------------+----------+------------+
80+
| LLVM_module_codegen_make_bitcode | 786.56ms | 6.875 | 960.73ms | 159 |
81+
+-----------------------------------------------+-----------+-----------------+----------+------------+
82+
| typeck_tables_of | 565.18ms | 4.940 | 689.39ms | 848 |
83+
+-----------------------------------------------+-----------+-----------------+----------+------------+
84+
| LLVM_module_codegen | 408.01ms | 3.566 | 6.26s | 159 |
85+
+-----------------------------------------------+-----------+-----------------+----------+------------+
86+
| mir_borrowck | 224.03ms | 1.958 | 543.33ms | 848 |
87+
+-----------------------------------------------+-----------+-----------------+----------+------------+
88+
| LLVM_module_codegen_emit_compressed_bitcode | 174.17ms | 1.522 | 174.17ms | 159 |
89+
+-----------------------------------------------+-----------+-----------------+----------+------------+
90+
| optimized_mir | 157.91ms | 1.380 | 205.29ms | 1996 |
91+
+-----------------------------------------------+-----------+-----------------+----------+------------+
92+
| evaluate_obligation | 146.50ms | 1.281 | 184.17ms | 8304 |
93+
+-----------------------------------------------+-----------+-----------------+----------+------------+
94+
| codegen_crate | 139.48ms | 1.219 | 1.58s | 1 |
95+
+-----------------------------------------------+-----------+-----------------+----------+------------+
96+
| mir_built | 123.88ms | 1.083 | 168.01ms | 848 |
97+
+-----------------------------------------------+-----------+-----------------+----------+------------+
98+
| metadata_decode_entry | 88.36ms | 0.772 | 117.77ms | 55642 |
99+
+-----------------------------------------------+-----------+-----------------+----------+------------+
100+
| incr_comp_copy_cgu_workproducts | 64.21ms | 0.561 | 64.21ms | 1 |
101+
+-----------------------------------------------+-----------+-----------------+----------+------------+
102+
| monomorphization_collector_graph_walk | 54.11ms | 0.473 | 344.00ms | 1 |
103+
+-----------------------------------------------+-----------+-----------------+----------+------------+
104+
| link_rlib | 43.21ms | 0.378 | 43.21ms | 1 |
105+
+-----------------------------------------------+-----------+-----------------+----------+------------+
106+
| check_impl_item_well_formed | 41.36ms | 0.362 | 77.14ms | 736 |
107+
+-----------------------------------------------+-----------+-----------------+----------+------------+
108+
| codegen_fulfill_obligation | 40.36ms | 0.353 | 51.56ms | 1759 |
109+
+-----------------------------------------------+-----------+-----------------+----------+------------+
110+
| expand_crate | 37.24ms | 0.326 | 48.52ms | 1 |
111+
+-----------------------------------------------+-----------+-----------------+----------+------------+
112+
| symbol_name | 36.31ms | 0.317 | 39.06ms | 5513 |
113+
+-----------------------------------------------+-----------+-----------------+----------+------------+
114+
| free_global_ctxt | 34.34ms | 0.300 | 34.34ms | 1 |
115+
+-----------------------------------------------+-----------+-----------------+----------+------------+
116+
...
117+
Total cpu time: 11.440758871s
118+
```
119+
120+
The output is sorted by the self time (time spent in the query or activity but not other queries or activities called by itself).
121+
As you can see, most of the compilation time is spent in LLVM generating the binary code for the executable.
122+
123+
### `flamegraph`
124+
125+
As you may have guessed, `flamegraph` will produce a [flame graph] of the profiling data.
126+
To run the tool, we'll pass just the filename without a file extension like we did for `summarize`:
127+
128+
```sh
129+
$ ../measureme/target/release/flamegraph regex-17088
130+
```
131+
132+
This will create a file called `rustc.svg` in the working directory:
133+
134+
[![Image of flamegraph output][flame graph img]][flame graph img]
135+
136+
[Click here] to try the interactive svg.
137+
138+
### `crox`
139+
140+
This tool processes self-profiling data into the JSON format that the Chromium profiler understands.
141+
You can use it to create a graphical timeline showing exactly when various traced events occurred.
142+
143+
In this section, we'll cover a few different modes `crox` can run in such as profiling an entire crate compilation including dependencies and filtering out small events.
144+
Let's get started with the basics!
145+
146+
#### Basic usage
147+
148+
To run the tool, we'll just pass the filename without a file extension like we've done before:
149+
150+
```sh
151+
$ ../measureme/target/release/crox regex-17088
152+
```
153+
154+
This creates a file called `chrome_profiler.json` in the working directory.
155+
To open it, we'll use the regular Chromium performance tools you might already be familiar with:
156+
157+
1. Open Chrome
158+
2. Open the Developer Tools console by pressing `Ctrl` + `Shift` + `i` (Windows/Linux) or `Cmd` + `Option` + `i` (macOS)
159+
3. Click the Performance tab at the top of the console.
160+
4. Click the "Load profile" button which looks like an arrow pointing up.
161+
5. Select the `chrome_profiler.json` file we created.
162+
163+
You should now see something similar to this:
164+
165+
[![Image of chrome profiler][chrome profiler img1]][chrome profiler img1]
166+
167+
You can use the scroll wheel on a mouse or the appropriate gesture on a touchpad to zoom in or out of the timeline.
168+
169+
#### Filtering short events
170+
171+
If the `chrome_profiler.json` file gets too large, the normal Chromium performance tools have issues opening the file.
172+
One easy way to deal with this is to tell `crox` to remove events shorter than a chosen duration:
173+
174+
```sh
175+
$ ../measureme/target/release/crox --minumum-duration 2 regex-17088
176+
```
177+
178+
Filtering out events less than 2 microseconds shrinks our `chrome_profiler.js` file from 27mb to 11mb.
179+
180+
#### Capturing event arguments
181+
182+
The self-profiler can be configured to record event arguments during compilation.
183+
For example, queries will include their query key.
184+
This functionality is turned off by default because it increases the self-profiler overhead.
185+
186+
To turn this feature on, we'll need to record a new compilation, passing an additional argument to `rustc`:
187+
188+
```sh
189+
$ cargo clean
190+
$ cargo rustc -- -Zself-profile -Zself-profile-events=default,args
191+
```
192+
193+
And then process the new output files:
194+
195+
```sh
196+
$ ../measureme/target/release/crox regex-23649
197+
```
198+
199+
Now in the Chromium profiler, if you click on a node, you can see additional data about many of the events at the bottom of the screen:
200+
201+
[![Image of Chrome profiler details][chrome profiler img2]][chrome profiler img2]
202+
203+
Which shows this `optimized_mir` query was processing the `regex::compile::{{impl}}::new` function body.
204+
205+
#### Profiling an entire crate graph
206+
207+
By using the `RUSTFLAGS` environment variable, we can profile every `rustc` invocation, not just the final crate's.
208+
`crox` can then combine all of the profiles together into one output file.
209+
Since this will create a lot of files, we'll first create a folder to put all the traces in.
210+
211+
```sh
212+
$ rm regex-17088.* regex-23649.* # clean up the old trace files since we're done with them
213+
$ mkdir profiles
214+
$ cargo clean
215+
$ RUSTFLAGS="-Zself-profile=./profiles -Zself-profile-events=default,args" cargo build
216+
```
217+
218+
This creates quite a few trace files in the working directory.
219+
Now, we'll tell `crox` to combine all of the trace files in the current directory together:
220+
221+
```sh
222+
$ ../measureme/target/release/crox --dir .
223+
```
224+
225+
Opening this file shows all of the crates compiled:
226+
227+
[![Image of Chrome profiler with all crates][chrome profiler img3]][chrome profiler img3]
228+
229+
Clicking on a crate will expand it to show the threads and event data inside it:
230+
231+
[![Image of Chrome profiler with a crate expanded][chrome profiler img4]][chrome profiler img4]
232+
233+
Thanks for reading!
234+
235+
We've been using these tools extensively ourselves over the last few months and they've helped us tremendously in understanding where the compiler spends its time.
236+
In the future we'll be adding more features and we'll work on making the tooling easier to use.
237+
If you have questions or would like to get involved with the Self-Profile Working Group, please check out the [measureme repository] or stop by our [Zulip stream].
238+
239+
[chrome profiler img1]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler1.png
240+
[chrome profiler img2]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler2.png
241+
[chrome profiler img3]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler3.png
242+
[chrome profiler img4]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/chrome_profiler4.png
243+
[Click here]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/rustc.svg
244+
[flame graph]: http://www.brendangregg.com/flamegraphs.html
245+
[flame graph img]: /images/inside-rust/2020-02-14-intro-rustc-self-profile/flamegraph_image.png
246+
[measureme repository]: https://github.com/rust-lang/measureme
247+
[Self-Profile Working Group]: https://rust-lang.github.io/compiler-team/working-groups/self-profile/
248+
[Zulip stream]: https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)