Description
Problem Statement
Our Browser SDKs by default create op: pageload
and op: navigation
idle spans to record spans and browser metrics during (hard) page loads and (soft) navigations (most commonly via client-side SPA routers). In most of our browserTracingIntegration
s, including the default one, we cancel (i.e. end) an ongoing idle span if there is still an active span. While this makes sense if you assume the new navigation is intentionally triggered (e.g. by a user clicking a link/button), it falls apart for automatic redirections. Such redirects are fairly common shortly after the pageload. A popular example is users opening a page on /
which causes the router to checks if they're authenticated. If yes, they're redirected to a /dashboard
page and otherwise t oa /login
page.
For redirects (or more generally for "non-user-triggered navigations"), this cancellation behavior has a variety of problems:
- Web vitals are currently only added to a pageload span.
- If this span is cancelled before the vitals are emitted, users end up without any web vitals
- Likewise, the value of LCP or CLS likely is only the initial value and might miss important updates after a redirection
- Semantically, the cancellation of the prior idle span splits the ongoing action into two distinct traces, where the separation might even seem arbitrary.
- As of today, users have no way to connect the previous and current/next idle span. This will be addressed by trace links but one can argue that this separation should not exist at all for redirects.
- It's worth noting that this doesn't only concern redirects after an initial pageload but potentially also redirects from a user-triggered navigation. For example, a user clicks on a link but misses the authorization/role to access the page and hence gets redirected to a "request authorization" page (looking at you Google Docs 👀)
Goal
We want to find a way to distinguish such automatic redirects from user-triggered navigations.
- In case of a redirect:
- Do not start a new trace (as of today)
- Do not start a new idle span (as of today)
- Instead, start a child navigation span of the ongoing idle span
- [TBD] We probably don't want to start a new root span here, to 1. avoid race conditions which root span (old idle vs. new root) gets resource and performance spans and 2. to keep a linear chain of traces
- In case of a user-triggered navigation:
- Continue cancelling the ongoing idle span (i.e. today's behaviour)
Options Considered
1. Distinguishing based on Heuristics
"User-triggered" navigation implies a click. We can listen globally to a click event and treat every navigation before the first click as a redirect/application-triggered navigation.
- navigations before the first click are considered application-triggered
- navigations afterwards user-triggered
- We probably need an upper bound for how long we consider a navigation application-triggered. Not all applications require a user interaction in the classic sense to trigger a user-intended or -perceived navigation (e.g. websites running on monitors that cycle through different pages)
Pros:
- realtively easy to implement
- solves the initial classic "check for auth and redirect to dashboard/login" case
Cons:
- Does not solve the redirect-after-user-triggered-navigation case
- Might need custom implementation for specific routing instrumentations that don't call
startBrowserTracingNavigationSpan
(?)
1.1 Reset heuristic on every user-initiated navigation
Same as above but reset the click listener and upper bound after each user-initiated navigation
Additional Pros:
- Also solves the redirect-after-user-triggered-navigation case
2. Leverage the framework/router
Some routers provide our instrumentation with enough information to distinguish between application- vs. user-initiated navigations (e.g. Ember, Angular, more?)
- We solve this "best effort"-wise for the routers where we get this information from
- We accept that this does not solve the problem for specific routers or the default instrumentation
Pros:
- less risk of false positives/negatives as we don't rely on a heuristic
Cons:
- Not applicable to all framework routers
- Not applicable to the default instrumentation
- Developers might not use the router-provided redirection mechanism but instead redirect as if users initiated the redirect
3. Provide users a manual pageload end reporting API.
Proposed in #14810.
Pro:
- full control for users, no idle mechanism, no heuristics
Cons:
- users need to handle a lot of "pageload ended" points on their own. For example, report pageload end on each page the router redirects to, whenever an error occurs, whenever users start a navigation, etc. A LOT of room for error and missed cases.
- Given the above, this can never become SDK default behaviour and really only should be used in special cases. Which to an extent, users can already do today anyway as pointed out in Programatic way to indicate that a pageload / navigation transaction has completed #14810 (comment).
4. Do nothing / null option
- With the introduction of [trace links], we can connect multiple traces in a chain, meaning the cancelled trace/idle span would be the previous trace of the navigation. We can build a UX in the product that identifies likely redirection trace pairs, based on a end/start timestamp window in combination with span status.
- With our effort to send some web vitals as standalone spans (Send LCP & CLS as standalone spans #12714) we avoid depending on the pageload span running until we have the "final" web vital values.
Pros:
- less code/bundle size as no need for a heuristic or any other special treatment
Cons:
- Both projects aren't completed and will not ship tomorrow.
- The semantic concern of splitting the trace (see above) is not addressed fully. No matter how/if we adjust the product to deal with this, data-wise it is still confusing.