feat: span processor api refactor #2962

paullegranddc · 2025-05-13T16:00:39Z

Design discussion issue (if applicable)

Changes

This PR refactors the SpanProcessor API.

The API currently has the following issues:

It's not possible to read a span fields in a processor without calling export_data, which completely copies the data
The SpanProcessor::on_end API means owned span data has to be passed to each span processor, even if it does nothing with it.

One way that was explored to fix number 2. was to pass a &mut Span to on_end do processor can use std::mem::take to consume data without copying, but this is not ideal for at least two reasons:

The otel sdk spec specifically forbids mutation of the span in on_end, and modifications done in one processor should not impact data passed to other processors
If there are multiple processors, and any processor but the last one grabs the data and leaves the span empty, this will cause bugs dependent on span processor ordering...

This PR thus proposes the following changes changes to the API:

Introduce a `ReadableSpan` trait, implemented for SdkSpan.

This trait define getters for span fields so we can read the span without copying or consuming data. This is inspired by other Otel libraries such as Java which has a similar interface.

Modify `on_end` to pass it a `FinishedSpan`.

FinishedSpan is a new abstraction around a span that has been closed. It implements the ReadableSpan trait so we can read from it without consuming the data.
if a span processor needs to get an owned SpanData on it. This will do one of two things:

if this is the last span processor that will be invoked, it gives ownership of the span data without copying
otherwise it copies the span data

This design

allows not copying in span processors that don't need owned data during on_end
doesn't expose mutable spans in on_end
eludes the span data copy if the exporter is the last span processor (which is usually the case).

Benchmarking

A benchmark running creating a span and dropping it, invoking the on_start, on_ending and on_end methods for a varying number of span processors has been added.

Baseline - main branch:

SpanProcessorApi/0_processors
    time:   [339.66 ns 340.56 ns 341.47 ns]
SpanProcessorApi/1_processors
    time:   [373.10 ns 374.36 ns 375.60 ns]
SpanProcessorApi/2_processors
    time:   [803.10 ns 804.99 ns 807.03 ns]
SpanProcessorApi/4_processors
    time:   [1.2096 µs 1.2137 µs 1.2179 µs]

Candidate - paullegranddc:paullgdc/sdk/span_processor_api_refactor:

SpanProcessorApi/0_processors
    time:   [385.15 ns 386.14 ns 387.25 ns]
SpanProcessorApi/1_processors
    time:   [385.73 ns 387.17 ns 388.85 ns]
SpanProcessorApi/2_processors
    time:   [384.84 ns 385.66 ns 386.50 ns]
SpanProcessorApi/4_processors
    time:   [386.78 ns 388.17 ns 389.58 ns]

On main, the cost of a span lifecycle is proportional to the number of span processors, because the Clone operation on span data is as expensive as creating the span.

After this PR, since span processors don't clone data unless they need it, the cost is constant regardless of the number of span processors.

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

Currently there is no way to read the span data without cloning it. This causes performance issues when the span for span processors that need to read the span data. This commit introduces a new trait `ReadableSpan` in the sdk implemented by SDK Spans that allows to read the span data without cloning it.

This API allows to mutations of the span when it is ending. It's marked as on development in the spec, but it is useful for span obfuscation for example, which needs to done after attributes can added to the span anymore.

codecov · 2025-05-13T20:13:15Z

Codecov Report

Attention: Patch coverage is 84.36578% with 53 lines in your changes missing coverage. Please review.

Project coverage is 81.1%. Comparing base (8882c31) to head (343aba2).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
opentelemetry-sdk/src/trace/span.rs	83.0%	53 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##            main   #2962    +/-   ##
======================================
  Coverage   81.0%   81.1%            
======================================
  Files        126     126            
  Lines      24753   25039   +286     
======================================
+ Hits       20070   20323   +253     
- Misses      4683    4716    +33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

scottgerring

Hey @paullegranddc thanks for opening the PR!
I've had a quick zoom through now and have some comments and queries.

This feels like a sensible way of addressing the problems you mentioned (and highlighted in the examples 👍 ). Of course it will break the existing on_end, but as we're not stable here, yet and there's a clear upshot, from my perspective that seems reasonable. I think @cijothomas will also have helpful opinions here - Cijo?

Adding on_ending seems like a no brainer.

It would be great to look at some big downstream projects (tracing-opentelemetry comes to mind), and check if they are impacted. It would also be great to have some perf numbers from benchmarks touching the path through ensure_ended_and_exported which you may already have handy?

opentelemetry-sdk/src/trace/span_processor.rs

scottgerring · 2025-05-20T10:56:56Z

opentelemetry-sdk/src/trace/span_processor.rs

    /// `on_end` is called after a `Span` is ended (i.e., the end timestamp is
    /// already set). This method is called synchronously within the `Span::end`
    /// API, therefore it should not block or throw an exception.
-    /// TODO - This method should take reference to `SpanData`
-    fn on_end(&self, span: SpanData);
+    fn on_end(&self, span: &mut FinishedSpan);


Noting this is a breaking change

opentelemetry-sdk/src/trace/span.rs

examples/tracing-http-propagator/src/server.rs

scottgerring · 2025-05-20T11:39:47Z

opentelemetry-sdk/benches/batch_span_processor.rs

@@ -4,10 +4,10 @@ use opentelemetry::trace::{
    SpanContext, SpanId, SpanKind, Status, TraceFlags, TraceId, TraceState,


Have you run the benchmark suite to do a performance regression against main ? Would be great to include.

Yes, I've run the existing BatchSpanProcessor benchmarks.
With experimental_span_processor_on_ending not enabled, the benchmark show no detectable changes in performance.

If the experimental_span_processor_on_ending feature is enabled, there is a slight cost (5%) which I believe comes from the fact we have to clone the tracer field, because we have to iterate over the span processor list through a shared reference, while passing the span mutably. Thus we have to clone the tracer associated with the span so we can split the ownership (it's in an Arc cell so this is not that expensive)

Are we missing a benchmark? It seems like some of your changes should improve performance -->

https://github.com/open-telemetry/opentelemetry-rust/pull/2962/files/1d3e6d89637daebad09a9f792ae095f63be48e1a#r2097787712

Yes, some benchmarks running multiple span processors were missing, I added them un this PR. The comparison between the main branch and this branch for the benchmarks is in this PR description.

Baseline - main branch: SpanProcessorApi/0_processors time: [339.66 ns 340.56 ns 341.47 ns] SpanProcessorApi/1_processors time: [373.10 ns 374.36 ns 375.60 ns] SpanProcessorApi/2_processors time: [803.10 ns 804.99 ns 807.03 ns] SpanProcessorApi/4_processors time: [1.2096 µs 1.2137 µs 1.2179 µs] Candidate - paullegranddc:paullgdc/sdk/span_processor_api_refactor: SpanProcessorApi/0_processors time: [385.15 ns 386.14 ns 387.25 ns] SpanProcessorApi/1_processors time: [385.73 ns 387.17 ns 388.85 ns] SpanProcessorApi/2_processors time: [384.84 ns 385.66 ns 386.50 ns] SpanProcessorApi/4_processors time: [386.78 ns 388.17 ns 389.58 ns]

Nice! This is exactly what I was hoping to see :D

opentelemetry-sdk/src/trace/span.rs

Adds a benchmark creating a span and dropping it, without an exporter. The goal is to estimate the cost of the SDK running without the exporter

scottgerring

This LGTM modulo the current builds passing. @cijothomas wdyt?

TommyCpp · 2025-05-27T16:37:59Z

opentelemetry-sdk/src/trace/span.rs

+
+impl FinishedSpan {
+    /// Creates a new `FinishedSpan` with the given span data.
+    pub fn new(span_data: crate::trace::SpanData) -> Self {


Should we implement a From between those two?

TommyCpp · 2025-05-27T16:47:44Z

opentelemetry-sdk/src/trace/span.rs

+            opentelemetry::otel_error!(name: "FinishedSpan.ConsumeTwice", message = "consume called twice on FinishedSpan in the same span processor");
+        }
+        self.try_consume()
+            .expect("Span data has already been consumed")


Could we just try_consume and throw error if it's already consumed

Should we be calling expect at all? This feels right on the line with "let's not panic in the library code", as it highlights an issue with a span processor

My philosophy on this is that in the end, if this panics it's because of an API misuse. Not because of wrong assumptions on the data manipulated by the program which I understand coding defensively against.
As someone implementing a span processor, I'd rather crash immediately if I misuse the API which is surely going to be caught in my tests.

Having a try_consume method gives you the freedom to handle the misuse if you want, but in practice as someone writing code against this API I'd rather not handling this mode of failure makes sense.

paullegranddc added 2 commits May 13, 2025 17:26

feat: add an experimental on_ending api

a0a8c23

This API allows to mutations of the span when it is ending. It's marked as on development in the spec, but it is useful for span obfuscation for example, which needs to done after attributes can added to the span anymore.

paullegranddc requested a review from a team as a code owner May 13, 2025 16:00

paullegranddc added 3 commits May 13, 2025 18:09

fix: formatting and clippy

54fa6f2

Merge branch 'main' into paullgdc/sdk/span_processor_api_refactor

445aa66

fix: some SpanProcessor calls were not finished

b72b031

paullegranddc added 3 commits May 14, 2025 11:58

fix: lint

afb4197

fix: lint clippy

4954b0d

Merge branch 'main' into paullgdc/sdk/span_processor_api_refactor

1d3e6d8

scottgerring added the performance label May 20, 2025

scottgerring closed this May 20, 2025

scottgerring reopened this May 20, 2025

scottgerring reviewed May 20, 2025

View reviewed changes

scottgerring closed this May 20, 2025

scottgerring reopened this May 20, 2025

feat: add span processor benchmarks

224a7ed

Adds a benchmark creating a span and dropping it, without an exporter. The goal is to estimate the cost of the SDK running without the exporter

paullegranddc mentioned this pull request May 20, 2025

feat: add span processor benchmarks #2980

Closed

4 tasks

paullegranddc added 12 commits May 20, 2025 17:18

fix: remove unised import

a62a9bd

fix: typos, so many typos...

5ce1ba4

fix: insert count as number instead of string in example

5dae8a7

fix: make on_ending contract more explicit

a28a461

fix: on_ending comment

edb290a

nit: move span endtime logic to callers of ensure_ended_and_exported

07840d8

fix: remove panic in FinishedSpan

9a2d119

fix: add tests and fix clippy

48a3edf

Merge branch 'main' into paullgdc/sdk/span_processor_api_refactor

e2edadb

nit: use None rather than empty string when no recording

4f9b012

nit: add test coverage for ReadableSpan implementation

ddcf5f9

fix: add non panicking version of FinishedSpan::consume

28dc62b

paullegranddc added 6 commits May 27, 2025 14:46

fix: test non panicking behavior

b80d612

fix: remove panics in finished span reads

8601bd1

fix: remove no longer relevant comment

86fd3da

nit: add comment explaining why we grab the data from the span mutably

2dd7ccf

fix: rename ensure_ended_and_exported to end_and_export

79270ce

fix: example return instead of panic on posined lock

d8f2f80

scottgerring approved these changes May 27, 2025

View reviewed changes

TommyCpp reviewed May 27, 2025

View reviewed changes

revert: remove on_ending API

343aba2

paullegranddc mentioned this pull request May 30, 2025

feat(span_processor): add on_ending callback #3010

Open

4 tasks

		@@ -4,10 +4,10 @@ use opentelemetry::trace::{
		SpanContext, SpanId, SpanKind, Status, TraceFlags, TraceId, TraceState,

feat: span processor api refactor #2962

Are you sure you want to change the base?

feat: span processor api refactor #2962

Conversation

paullegranddc commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design discussion issue (if applicable)

Changes

Introduce a ReadableSpan trait, implemented for SdkSpan.

Modify on_end to pass it a FinishedSpan.

Benchmarking

Merge requirement checklist

Uh oh!

codecov bot commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

scottgerring left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

scottgerring left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paullegranddc commented May 13, 2025 •

edited

Loading

Introduce a `ReadableSpan` trait, implemented for SdkSpan.

Modify `on_end` to pass it a `FinishedSpan`.

codecov bot commented May 13, 2025 •

edited

Loading

scottgerring left a comment •

edited

Loading

scottgerring left a comment •

edited

Loading