Skip to content

fix(tracing): respect idleTimeout after last finished activity #4442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

lighta971
Copy link

@lighta971 lighta971 commented Jan 21, 2022

  • If you've added code that should be tested, please add tests.
  • Ensure your code lints and the test suite passes (yarn lint) & (yarn test).

@AbhiPrasad
Copy link
Member

Thanks for opening a PR!

In essence this is a quiescence timeout - and def the approach we want to take.

Holding off on merging this, because we'll need a wrapping final timeout in addition to a heurisitic to prevent polling (see comment in: https://github.com/getsentry/sentry-javascript/pull/2672/files#r440770700). We are currently exploring alternate algorithms.

In addition, there is probably a way we can relate LCP to extend/cancel timeouts accordingly - but that might not scale to other platforms that use idle transaction, like react native.

@alquerci
Copy link

alquerci commented Jan 25, 2022

Hello there,

Here there is another issue about the documentation that is not enough explicit, and adds confusion.
As we took with @lighta971 a long time to understand why transaction cut.

It seems that the transaction will end when

  • the first finished span timestamp + idleTimeout is reached
  • and as soon as there are no activities.

Does this resume the current behaviour, or something is missing ?


About the 0 -> 1 -> 0 -> 1 problem.
Does the maxTransactionDuration configuration will handle it?

The risk is that all transaction duration become equals to maxTransactionDuration.
To mitigate this, idleTimeout must be less than the polling internal.
Then need to detect with an algorithm this interval or let it configured manually with idleTimeout.
So need a dynamic or static idleTimeout.

Perhaps to fix it, a BC break is required. Except, with a feature flag.

Yeah, the issue is not so easy to tackle as it depends on the application, use case, the product goal of the measure.

@AbhiPrasad
Copy link
Member

Thanks for your interest in help addressing this!

At the current moment, a transaction will trigger it's idle timeout the first moment it hits 0 activities (0 active child spans) - and will not reset this idleTimeout. This means that even if more activities are added afterwards (spans get started) after the activities count goes to 0, it'll still finish after the idletimeout is over, regardless if the spans were still in progress.

The risk is that all transaction duration become equals to maxTransactionDuration.
To mitigate this, idleTimeout must be less than the polling internal.

The polling interval is arbitrary here - it's dependent when child spans start/stop, because that's when we have logic around triggering the various timeouts.

We probably want to phase out maxTransactionDuration here in favour of a double timeout system (idleTimeout and finalTimeout - where we can adjust the finalTimeout heuristically based on what is happening with the idleTimeout) - we can also maybe then eliminate our heartbeat system.

Here's what I'm thinking on a high level (this will change when we implement though!):

  • transaction is started on pageload/navigation
  • start an initial idleTimeout (1s)
  • start an initial finalTimeout (FINAL_TIMEOUT_DEFAULT = 10s)
  • fetch/xhr/other spans start, which cause the activities count to increase
  • after arbitrary time, these activities go to 0 -> triggers idleTimeout
  • response from fetch/xhr starts new spans (either component renders or other fetch/xhr requests)
  • New span = new activities -> clears idleTimeout. Increment counter that says idleTimeout was cleared.
  • If idleTimeout was cleared 3 times, we can say that we've hit 0 activities 3 times, heuristically we can say that we might be in a polling situation so we should not clear the timeout this time.
  • if we are over FINAL_TIMEOUT_DEFAULT seconds due to idleTimeout, clear and restart final timeout by adjusting it by the additional time taken by idleTimeout.

This means we can remove the heartbeat, and maxTransactionDuration, and rely on our finalTimeout to accomplish all of that (maybe even save some bundle size).

Perhaps to fix it, a BC break is required. Expect, with a feature flag.

Don't think we need a breaking change - I think if we move to a more correct behaviour, we can accept slight behaviour differences in terms of data that show up.

Another complexity here is that our React Native SDK uses IdleTransactions for it's performance monitoring, so we have to make sure the experience makes sense for that SDK as well!

@alquerci
Copy link

LGTM

Replying to your high level thinking by trying to find questions.

transaction is started on pageload/navigation

When the transaction should end?

  • At the same time as another pageload or navigation is triggered?

after arbitrary time, these activities go to 0 -> triggers idleTimeout

Should inactivity detection be effective during 100ms or less? (e.g. use case synchronous spans)

finishReason: idleTimeout
image

Hum, timing concern.

If idleTimeout was cleared 3 times, we can say that we've hit 0 activities 3 times, heuristically we can say that we might be in a polling situation so we should not clear the timeout this time.

When the maxClearCount is 0, is it the as current behaviour?

[Delirium] Does this heuristic may be closed to the exponential back off but reversed (e.g. 1s, 800m, 400m, 0)?

if we are over FINAL_TIMEOUT_DEFAULT seconds due to idleTimeout, clear and restart final timeout by adjusting it by the additional time taken by idleTimeout.

Can you give an example of the adjustment?
How to avoid unexpected long transaction as the final timeout can be cleared and restart?

finishReason: heartbeatFailed
image

Wrapping up

There are 3 topics

  1. 0 activity detection (timing concern)
  2. idle detection (idleTimeout)
  3. handle of maximum transaction duration (heatbeat, finalTimeout)

@AbhiPrasad
Copy link
Member

Should be addressed by #4531

@AbhiPrasad AbhiPrasad closed this Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants