Closed
Description
Goals for the first usable iteration of -Zself-profile
are:
- Make the compiler track query invocations and other important function calls (e.g. LLVM related)
- This means tracking the query/function name (no query keys, arguments yet)
- Reduce the overhead of tracking and profile generation as to not be prohibitive
- This means emitting events in an optimized binary format
- Write a post-processing tool that generates an aggregated report from the raw event data
- The aggregated report is a table with one line per query/function and columns for
- total time spent in the query (in milliseconds)
- time spent in the query as percentage of total compile time
- number of query invocations
- percentage of in-memory cache hits
- percentage of incremental cache hits
- total time spent (milliseconds) in loading query results from incremental cache
- total time spent (milliseconds) blocked on concurrent query invocations
- The aggregated report is a table with one line per query/function and columns for
- Re-enable self-profiling on perf.rlo, which includes
- running the postprocessing tool to generate the report for each test run
- adding a new comparison view that compares the test runs of a single benchmark and shows changes per query. This view is reachable by clicking on a benchmark in the regular comparison view (i.e. one can "zoom" into a given benchmark)
- Document how self-profiling works in the rustc-guide.
Non-Goals are:
- Supporting self-profiling in 32-bit compilers -- this makes it easier to rely on things like memory mapped files
- Tracking individual query keys/function arguments
Work packages resulting from this set of goals are:
- Implement a library that takes care of reading and writing the binary event format
- Make the compiler use the library to emit profiling data efficiently
- Initial integration at Use measureme in self profiler #59515
- Implement "event filtering" in order to keep profiling overhead low in the common case. (Implement event filtering for self-profiler. #59915)
- Add output directory argument to
-Zself-profile
(Allow to specify profiling data output directory as -Zself-profile argument. #61123) - Add a version header to profiler artifacts (Add versioning to the binary profile format measureme#40)
- Implement a postprocessing tool (using the library) that generates the aggregated report (Implement a summarization tool for profile traces measureme#17)
- Make perf.rlo support self-profile:
- run benchmarks with
-Zself-profile
- run postprocessing tool
- store aggregated reports
- implement the detailed comparison view
- make the regular comparison view link to detailed views
- run benchmarks with
- Review and make sure that we are tracking everything we are interested in. Things to check:
- Pre-query passes (parsing, macro expansion, name resolution, HIR lowering, ...)
- LLVM optimization passes
- Metadata loading/decoding
-
Trait selection(removed this from the MVP for now)
- Document how self-profiling works in the rustc-guide.
- Polishing iteration
- Write high-level crate docs for measureme (implemented in Write some crate level documentation measureme#68)
- Detailed view should show "percentage of total time" column (detailed-view: Percentage of total time column missing rustc-perf#523)
- Show total sum line in table for the entire crate (detailed-view: Add a "sum total" line at the top of the table. rustc-perf#525)
- Make sorting more visible/accessible in the results table (detailed views: Make it more visible that columns can be sorted. rustc-perf#526)
- It's unclear what the "invocations" and "cache misses" columns in the detailed view are exactly. (detailed (comparison) view: The
cache misses
column should be removed,invocations
should be renamed rustc-perf#529) - Resolve a bug in the sum for
incr. loading time
column (detailed view: Possible bug in "totals" line for incr. comp. cache loading rustc-perf#527) - Clean up self-time computation in
summarize
(Event recording & summarize need cleanup with respect to self-time vs incr-load-time vs blocked-time vs total-time. measureme#75)
Possible Problems that might arise are:
- Profiling overhead keeps being too high - then we need to think about doing separate
self-profile
runs on perf.rlo