feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

billyvg · 2023-01-12T16:38:58Z

This means that we will only wait 5 seconds before flushing. This should help reduce potential lost segments due to page reloads/closed tabs.

mydea

Should we just simplify our debounce implemention and get rid of this config then? (so just make it have a single flush time option, instead of two)?

Lms24 · 2023-01-12T16:47:54Z

Hmm but at the same time, this will increase the chance of running into rate limits, right?

github-actions · 2023-01-12T16:49:36Z

size-limit report 📦

Path	Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified)	19.82 KB (+0.01% 🔺)
@sentry/browser - ES5 CDN Bundle (minified)	61.44 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified)	18.49 KB (+0.03% 🔺)
@sentry/browser - ES6 CDN Bundle (minified)	54.74 KB (0%)
@sentry/browser - Webpack (gzipped + minified)	20.22 KB (0%)
@sentry/browser - Webpack (minified)	66.17 KB (0%)
@sentry/react - Webpack (gzipped + minified)	20.24 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified)	47.51 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified)	26.73 KB (+0.03% 🔺)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified)	25.01 KB (-0.01% 🔽)
@sentry/replay ES6 CDN Bundle (gzipped + minified)	43.11 KB (-0.83% 🔽)
@sentry/replay - Webpack (gzipped + minified)	38.45 KB (-1% 🔽)

billyvg · 2023-01-12T16:49:46Z

Hmm but at the same time, this will increase the chance of running into rate limits, right?

Yeah that's true

billyvg · 2023-01-12T16:52:16Z

Should we just simplify our debounce implementation and get rid of this config then? (so just make it have a single flush time option, instead of two)?

Yeah I think we should see how this performs IRT to missing segments first (and maybe see rate limiting increases?)

Lms24 · 2023-01-12T17:30:07Z

see how this performs IRT to missing segments first (and maybe see rate limiting increases?)

I thought we don't yet have a view into rate limiting client reports (thinking of our _admin discussion and how it only shows errors and transactions)? I mean we can make this change, sure, but the question is if we can actually analyze if it changed improved something.

bruno-garcia · 2023-01-12T18:18:37Z

Hmm but at the same time, this will increase the chance of running into rate limits, right?

yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load'

Lms24

yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load'

Fair enough, let's try this

bruno-garcia · 2023-01-16T19:31:19Z

Added label to trigger overhead measurement job

This means that we will only wait 5 seconds before flushing

github-actions · 2023-01-17T08:08:22Z

Replay SDK metrics 🚀

Latest data for: 1f8f2bf

	Plain	+Sentry			+Replay
	Value	Value	Diff	Ratio	Value	Diff	Ratio
LCP	68.58 ms	92.82 ms	+24.25 ms	+35.36 %	131.28 ms	+38.45 ms	+41.43 %
CLS	0.02 ms	0.02 ms	+0.00 ms	+0.09 %	0.02 ms	+0.00 ms	+0.96 %
CPU	14.72 %	15.15 %	+0.43 pp	+2.90 %	27.44 %	+12.29 pp	+81.14 %
JS heap avg	1.73 MB	2.14 MB	+405.53 kB	+23.44 %	3.3 MB	+1.17 MB	+54.70 %
JS heap max	1.89 MB	2.35 MB	+459.04 kB	+24.24 %	4.44 MB	+2.09 MB	+88.73 %

*) pp - percentage points - an absolute difference between two percentages.
Last updated: Tue, 17 Jan 2023 09:07:08 GMT

mydea · 2023-01-17T08:46:27Z

So both CPU as well as LCP do seem a bit better (compared to #6611):

CPU: 25.43% vs. 34.75%
LCP: 116ms vs. 125ms

I wonder how stable that is - I'll rerun the overhead job to be sure. If that's correct, it would be nice, but I wonder if such a large CPU reduction is realistic with this PR? 🤔

Edit: Hmm, ok, so second run gives very different results 😅 156ms LCP & 42.50% CPU.
Third run: LCP 131, CPU 27.44.

So overall we got for this PR:

LCP: 116ms, 156ms, 131ms
CPU: 25%, 42%, 27%

Lms24 · 2023-01-17T10:22:33Z

ok, so second run gives very different results

@vaind any idea how to get more consistent results? Should we increase the number of measurement runs to get more consistent averages?

vaind · 2023-01-17T11:02:19Z

ok, so second run gives very different results

@vaind any idea how to get more consistent results? Should we increase the number of measurement runs to get more consistent averages?

It's currently collected as an average over 10 subsequent runs and there are checks on (a fairly low) standard deviance between runs. Locally the results are pretty stable regardless of the time of the collection so the issue here may be caused by running on Linux VMs which are a shared resource. I think all the currently available public runners on GitHub Actions are VM based, even macOS ones, though I can give it a go and see if it's more stable throughout the day.

On mobile, we ran these on real devices so while there's variance still, it may be a bit lower.

Also, these need to be treated as a rough indicator of whether there may be something wrong and needs closer inspection, e.g. by manually profiling and verifying the changes. It's not like measuring bundle size which will be super stable all the time.

Also, it may be better to compare the "Ratio" field which compares to the current baseline, e.g. for the CPU usage. If you consider that it doesn't change nearly as much as absolute values (which are more influenced by the parallel load on the VM).

BTW, the table currently shows diff & ratio to the previous "scenario", so "+Replay" shows diff & ratio to "+Sentry"... Looking at it now maybe we should change that so that both "+Replay" and "+Sentry" are compared directly to "Plain" - that would make it easier to read I think

Lms24 · 2023-01-18T13:51:25Z

Looking at it now maybe we should change that so that both "+Replay" and "+Sentry" are compared directly to "Plain" - that would make it easier to read I think

Agreed, would you mind opening a PR for that?

billyvg marked this pull request as ready for review January 12, 2023 16:41

billyvg requested review from mydea, Lms24 and JoshFerge January 12, 2023 16:43

JoshFerge approved these changes Jan 12, 2023

View reviewed changes

mydea reviewed Jan 12, 2023

View reviewed changes

AbhiPrasad mentioned this pull request Jan 12, 2023

Replay should not be included in bundle by default #6762

Closed

Lms24 approved these changes Jan 16, 2023

View reviewed changes

bruno-garcia added the CI-Overhead-Measurements label Jan 16, 2023

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds

94fed11

This means that we will only wait 5 seconds before flushing

mydea force-pushed the feat-replay-change-flush-timeout-to-5-seconds branch from 0a97be7 to 94fed11 Compare January 17, 2023 07:56

vaind mentioned this pull request Jan 18, 2023

chore: update metrics PR comment diff/percentage base #6849

Merged

billyvg merged commit 9ed9c23 into master Jan 18, 2023

billyvg deleted the feat-replay-change-flush-timeout-to-5-seconds branch January 18, 2023 14:49

bruno-garcia mentioned this pull request Jan 19, 2023

Change the web app used to test SDK overhead #6857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

Uh oh!

billyvg commented Jan 12, 2023 •

edited

Loading

Uh oh!

mydea left a comment

Uh oh!

Lms24 commented Jan 12, 2023

Uh oh!

github-actions bot commented Jan 12, 2023 •

edited

Loading

Uh oh!

billyvg commented Jan 12, 2023

Uh oh!

billyvg commented Jan 12, 2023

Uh oh!

Lms24 commented Jan 12, 2023 •

edited

Loading

Uh oh!

bruno-garcia commented Jan 12, 2023

Uh oh!

Lms24 left a comment

Uh oh!

bruno-garcia commented Jan 16, 2023

Uh oh!

github-actions bot commented Jan 17, 2023 •

edited

Loading

Uh oh!

mydea commented Jan 17, 2023 •

edited

Loading

Uh oh!

Lms24 commented Jan 17, 2023

Uh oh!

vaind commented Jan 17, 2023 •

edited

Loading

Uh oh!

Lms24 commented Jan 18, 2023

Uh oh!

Uh oh!

Uh oh!

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761

Uh oh!

Conversation

billyvg commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mydea left a comment

Choose a reason for hiding this comment

Uh oh!

Lms24 commented Jan 12, 2023

Uh oh!

github-actions bot commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

size-limit report 📦

Uh oh!

billyvg commented Jan 12, 2023

Uh oh!

billyvg commented Jan 12, 2023

Uh oh!

Lms24 commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bruno-garcia commented Jan 12, 2023

Uh oh!

Lms24 left a comment

Choose a reason for hiding this comment

Uh oh!

bruno-garcia commented Jan 16, 2023

Uh oh!

github-actions bot commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Replay SDK metrics 🚀

Uh oh!

mydea commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lms24 commented Jan 17, 2023

Uh oh!

vaind commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lms24 commented Jan 18, 2023

Uh oh!

Uh oh!

billyvg commented Jan 12, 2023 •

edited

Loading

github-actions bot commented Jan 12, 2023 •

edited

Loading

Lms24 commented Jan 12, 2023 •

edited

Loading

github-actions bot commented Jan 17, 2023 •

edited

Loading

mydea commented Jan 17, 2023 •

edited

Loading

vaind commented Jan 17, 2023 •

edited

Loading