Skip to content

Commit 3b038ed

Browse files
authored
attributes: help LLVM understand that some spans are never going to do anything (#1600) (#1605)
## Motivation Adding `#[instrument(level = "debug")]` attributes to functions in rustc caused a performance regression (in release, where `debug!` is fully optimized out) across all crates: rust-lang/rust#89048 (comment) While trying to debug this, I noticed that spans don't have the same advantage that events have wrt to how LLVM sees them. Spans (or more precisely, the enter-guard), will get dropped at the end of the scope, which throws a spanner into the LLVM optimization pipeline. I am not entirely sure where the problem is, but I am moderately certain that the issue is that even entering a dummy span is too much code for LLVM to reliably (or at all) optimize out. ## Solution My hope is that in trusting the Rust compiler to generate cool code when using drop flags, we can essentially generate a drop flag that depends on something we know (due to events working as expected) to be optimizable. So instead of doing ```rust let _x = span!(); let _y = _x.enter(); // lotsa code drop(_y) ``` we do ```rust let _x; let _y; let must_drop = false; if level_enabled!(DEBUG) { must_drop = true; _x = span!(); _y = _x.enter(); } // lotsa code if must_drop { drop(_y) } ``` I believe this will allow LLVM to properly optimize this again. Testing that right now, but I wanted to open this PR immediately for review.
1 parent 6720bec commit 3b038ed

File tree

1 file changed

+51
-8
lines changed

1 file changed

+51
-8
lines changed

tracing-attributes/src/lib.rs

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -601,6 +601,8 @@ fn gen_block(
601601
.map(|name| quote!(#name))
602602
.unwrap_or_else(|| quote!(#instrumented_function_name));
603603

604+
let level = args.level();
605+
604606
// generate this inside a closure, so we can return early on errors.
605607
let span = (|| {
606608
// Pull out the arguments-to-be-skipped first, so we can filter results
@@ -646,7 +648,6 @@ fn gen_block(
646648
}
647649
}
648650

649-
let level = args.level();
650651
let target = args.target();
651652

652653
// filter out skipped fields
@@ -713,7 +714,9 @@ fn gen_block(
713714
if err {
714715
quote_spanned!(block.span()=>
715716
let __tracing_attr_span = #span;
716-
tracing::Instrument::instrument(async move {
717+
// See comment on the default case at the end of this function
718+
// for why we do this a bit roundabout.
719+
let fut = async move {
717720
match async move { #block }.await {
718721
#[allow(clippy::unit_arg)]
719722
Ok(x) => Ok(x),
@@ -722,22 +725,46 @@ fn gen_block(
722725
Err(e)
723726
}
724727
}
725-
}, __tracing_attr_span).await
728+
};
729+
if tracing::level_enabled!(#level) {
730+
tracing::Instrument::instrument(
731+
fut,
732+
__tracing_attr_span
733+
)
734+
.await
735+
} else {
736+
fut.await
737+
}
726738
)
727739
} else {
728740
quote_spanned!(block.span()=>
729741
let __tracing_attr_span = #span;
742+
// See comment on the default case at the end of this function
743+
// for why we do this a bit roundabout.
744+
let fut = async move { #block };
745+
if tracing::level_enabled!(#level) {
730746
tracing::Instrument::instrument(
731-
async move { #block },
747+
fut,
732748
__tracing_attr_span
733749
)
734750
.await
751+
} else {
752+
fut.await
753+
}
735754
)
736755
}
737756
} else if err {
738757
quote_spanned!(block.span()=>
739-
let __tracing_attr_span = #span;
740-
let __tracing_attr_guard = __tracing_attr_span.enter();
758+
// See comment on the default case at the end of this function
759+
// for why we do this a bit roundabout.
760+
let __tracing_attr_span;
761+
let __tracing_attr_guard;
762+
if tracing::level_enabled!(#level) {
763+
__tracing_attr_span = #span;
764+
__tracing_attr_guard = __tracing_attr_span.enter();
765+
}
766+
// pacify clippy::suspicious_else_formatting
767+
let _ = ();
741768
#[allow(clippy::redundant_closure_call)]
742769
match (move || #block)() {
743770
#[allow(clippy::unit_arg)]
@@ -750,8 +777,24 @@ fn gen_block(
750777
)
751778
} else {
752779
quote_spanned!(block.span()=>
753-
let __tracing_attr_span = #span;
754-
let __tracing_attr_guard = __tracing_attr_span.enter();
780+
// These variables are left uninitialized and initialized only
781+
// if the tracing level is statically enabled at this point.
782+
// While the tracing level is also checked at span creation
783+
// time, that will still create a dummy span, and a dummy guard
784+
// and drop the dummy guard later. By lazily initializing these
785+
// variables, Rust will generate a drop flag for them and thus
786+
// only drop the guard if it was created. This creates code that
787+
// is very straightforward for LLVM to optimize out if the tracing
788+
// level is statically disabled, while not causing any performance
789+
// regression in case the level is enabled.
790+
let __tracing_attr_span;
791+
let __tracing_attr_guard;
792+
if tracing::level_enabled!(#level) {
793+
__tracing_attr_span = #span;
794+
__tracing_attr_guard = __tracing_attr_span.enter();
795+
}
796+
// pacify clippy::suspicious_else_formatting
797+
let _ = ();
755798
#block
756799
)
757800
}

0 commit comments

Comments
 (0)