fix(opentelemetry): Fix span & sampling propagation #11092
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OK, this was a tricky one, but I think it now works as expected.
This PR fixes to fundamental issues with sampling & propagation that were uncovered by @Lms24 & myself while trying to use OTEL for remix & sveltekit:
continueTrace
updates the propagation context, but if there is an active parent span (even a remote one) this is ignored.sampled=false
(sampled to be not recorded) andsampled=undefined
(no sampling decision yet).Update to
continueTrace
& trace propagationWhile my first instinct was to ensure that in the trace methods, if we have remote span we ignore it and look at the propagation context, this has a bunch of problems - because it means we can run out of sync, if this is set from outside, etc.
So instead, I now provide a custom
continueTrace
method from@sentry/opentelemetry
&@sentry/node
which should be used instead of the core one in meta SDKs. This method will, in addition to updating the propagation context, also create a remote span with the passed in data, and make it the active span in the callback.Then, I updated the otel start span APIs to always use that, if it exists (which was already the behavior we had), PLUS also added behavior that if there is no active span at all (not even a remote one), then we look at the propagation context.
Update to sampling inheritance
Previously, we basically did the following:
Which means that if we create a remote span from a minimal propagation context:
We would later always get
sampled: false
, and inherit this decision for all downstream spans - instead of treating it asundefined
, and going through the sampler, as we actually want it to.In order to "solve" this, I added a new trace state
SENTRY_TRACE_STATE_SAMPLED_NOT_RECORDING
, which we set if we know this is actuallysampled: false
, and not just unset.Then, based on this we can interpret
sampled
as beingfalse
orundefined
, respectively.This is a bit hacky but should work - it means that if we get a sampling decision from outside we'll treat it as
undefined
, which is OK I would say. Our own sampler will set this correctly so we inherit correctly as well, and our propagator does so too.