Tracking Issue for `self-profile` minimum viable product

**Goals** for the first usable iteration of `-Zself-profile` are:

- Make the compiler track query invocations and other important function calls (e.g. LLVM related)
  - This means tracking the query/function name (no query keys, arguments yet)
- Reduce the overhead of tracking and profile generation as to not be prohibitive
  - This means emitting events in an optimized binary format
- Write a post-processing tool that generates an aggregated report from the raw event data
  - The aggregated report is a table with one line per query/function and columns for 
    - total time spent in the query (in milliseconds)
    - time spent in the query as percentage of total compile time
    - number of query invocations
    - percentage of in-memory cache hits
    - percentage of incremental cache hits
    - total time spent (milliseconds) in loading query results from incremental cache
    - total time spent (milliseconds) blocked on concurrent query invocations
- Re-enable self-profiling on perf.rlo, which includes
    - running the postprocessing tool to generate the report for each test run
    - adding a new comparison view that compares the test runs of a single benchmark and shows changes per query. This view is reachable by clicking on a benchmark in the regular comparison view (i.e. one can "zoom" into a given benchmark)
- Document how self-profiling works in the [rustc-guide](https://github.com/rust-lang/rustc-guide).

**Non-Goals** are:
- Supporting self-profiling in 32-bit compilers -- this makes it easier to rely on things like memory mapped files
- Tracking individual query keys/function arguments

**Work packages** resulting from this set of goals are:

- [x] Implement a library that takes care of reading and writing the binary event format
  - See https://github.com/rust-lang/measureme/
- [x] Make the compiler use the library to emit profiling data efficiently
  - [x] Initial integration at https://github.com/rust-lang/rust/pull/59515
  - [x] Implement "event filtering" in order to keep profiling overhead low in the common case. (#59915)
  - [x] Add output directory argument to `-Zself-profile` (#61123)
  - [x] Add a version header to profiler artifacts (https://github.com/rust-lang/measureme/issues/40)
- [x] Implement a postprocessing tool (using the library) that generates the aggregated report (rust-lang/measureme#17)
- [x] Make perf.rlo support self-profile:
  - [x] run benchmarks with `-Zself-profile`
  - [x] run postprocessing tool
  - [x] store aggregated reports
  - [x] implement the detailed comparison view 
  - [x] make the regular comparison view link to detailed views
- [x] Review and make sure that we are tracking everything we are interested in. Things to check:
  - [x] Pre-query passes (parsing, macro expansion, name resolution, HIR lowering, ...)
  - [x] LLVM optimization passes
  - [x] Metadata loading/decoding
  - [x] ~~Trait selection~~ (removed this from the MVP for now)
- [x] Document how self-profiling works in the [rustc-guide](https://github.com/rust-lang/rustc-guide).
- [x] Polishing iteration
  - [x] Write high-level crate docs for measureme (implemented in https://github.com/rust-lang/measureme/pull/68)
  - [x] Detailed view should show "percentage of total time" column (https://github.com/rust-lang-nursery/rustc-perf/issues/523)
  - [x] Show total sum line in table for the entire crate (https://github.com/rust-lang-nursery/rustc-perf/issues/525)
  - [x] Make sorting more visible/accessible in the results table (https://github.com/rust-lang-nursery/rustc-perf/issues/526)
  - [x] It's unclear what the "invocations" and "cache misses" columns in the detailed view are exactly. (https://github.com/rust-lang-nursery/rustc-perf/issues/529)
  - [x] Resolve a bug in the sum for `incr. loading time` column (https://github.com/rust-lang-nursery/rustc-perf/issues/527)
  - [x] Clean up self-time computation in `summarize` (https://github.com/rust-lang/measureme/issues/75)

**Possible Problems** that might arise are:
- Profiling overhead keeps being too high - then we need to think about doing separate `self-profile` runs on perf.rlo

cc @wesleywiser @Mark-Simulacrum 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue for `self-profile` minimum viable product #58967

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tracking Issue for self-profile minimum viable product #58967

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Tracking Issue for `self-profile` minimum viable product #58967