-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(replay): Lower the flush max delay from 15 seconds to 5 seconds #6761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just simplify our debounce implemention and get rid of this config then? (so just make it have a single flush time option, instead of two)?
Hmm but at the same time, this will increase the chance of running into rate limits, right? |
size-limit report 📦
|
Yeah that's true |
Yeah I think we should see how this performs IRT to missing segments first (and maybe see rate limiting increases?) |
I thought we don't yet have a view into rate limiting client reports (thinking of our _admin discussion and how it only shows errors and transactions)? I mean we can make this change, sure, but the question is if we can actually analyze if it changed improved something. |
yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and if we get customers running into this due to higher loads, we can ask them to use this option to 'reduce load'
Fair enough, let's try this
This means that we will only wait 5 seconds before flushing
0a97be7
to
94fed11
Compare
Replay SDK metrics 🚀Latest data for: 1f8f2bf
*) pp - percentage points - an absolute difference between two percentages.
Last updated: Tue, 17 Jan 2023 09:07:08 GMT |
So both CPU as well as LCP do seem a bit better (compared to #6611):
I wonder how stable that is - I'll rerun the overhead job to be sure. If that's correct, it would be nice, but I wonder if such a large CPU reduction is realistic with this PR? 🤔 Edit: Hmm, ok, so second run gives very different results 😅 156ms LCP & 42.50% CPU. So overall we got for this PR:
|
@vaind any idea how to get more consistent results? Should we increase the number of measurement runs to get more consistent averages? |
It's currently collected as an average over 10 subsequent runs and there are checks on (a fairly low) standard deviance between runs. Locally the results are pretty stable regardless of the time of the collection so the issue here may be caused by running on Linux VMs which are a shared resource. I think all the currently available public runners on GitHub Actions are VM based, even macOS ones, though I can give it a go and see if it's more stable throughout the day. On mobile, we ran these on real devices so while there's variance still, it may be a bit lower. Also, these need to be treated as a rough indicator of whether there may be something wrong and needs closer inspection, e.g. by manually profiling and verifying the changes. It's not like measuring bundle size which will be super stable all the time. Also, it may be better to compare the "Ratio" field which compares to the current baseline, e.g. for the CPU usage. If you consider that it doesn't change nearly as much as absolute values (which are more influenced by the parallel load on the VM). BTW, the table currently shows diff & ratio to the previous "scenario", so "+Replay" shows diff & ratio to "+Sentry"... Looking at it now maybe we should change that so that both "+Replay" and "+Sentry" are compared directly to "Plain" - that would make it easier to read I think |
Agreed, would you mind opening a PR for that? |
This means that we will only wait 5 seconds before flushing. This should help reduce potential lost segments due to page reloads/closed tabs.