-
Notifications
You must be signed in to change notification settings - Fork 236
feat!: metrics refactor #3262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: metrics refactor #3262
Conversation
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3262/docs/iroh/ Last updated: 2025-05-05T17:49:33Z |
@@ -99,10 +105,10 @@ impl Stream for MappedStream { | |||
if !self.was_direct_before && matches!(new_conn_type, ConnectionType::Direct(_)) | |||
{ | |||
self.was_direct_before = true; | |||
inc!(MagicsockMetrics, connection_became_direct); | |||
became_direct = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I don't think you want to change this logic. You can totally become direct more than once for the same connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is best to keep the refactor faithull to the current behaviour, including all the bugs. and do the bug fixes in another PR(s). That will make it much easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this changes the logic.
Nor do I think the logic had bugs. It executes the body of this if
only once over the course of was_direct_before
's lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was not to fix any bugs or change existing behavior, only to not clone the the metrics once more into each connection type stream, but instead handle the metrics update in the top level actor. So if I changed anything of substance, it was my bad and I will revert.
In my reading: Before - increment connection_became_direct
if the if-statement in line 99/105 triggers. After: pass became_direct: true
to the outer actor if the if-statement in line 99/105 triggers, and then increment connection_became_direct
in the outer actor.
iroh-relay/src/server/http_server.rs
Outdated
@@ -869,7 +876,7 @@ mod tests { | |||
let tls_config = make_tls_config(); | |||
|
|||
// start server | |||
let mut server = ServerBuilder::new("127.0.0.1:0".parse().unwrap()) | |||
let mut server = ServerBuilder::new("127.0.0.1:0".parse().unwrap(), Default::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nicer to make the metrics be a call
eg
// default
ServerBuilder::new("127.0.0.1:0".parse().unwrap())
.spawn();
// explicit
ServerBuilder::new("127.0.0.1:0".parse().unwrap())
.metrics(metrics)
.spawn();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean making it optional though, no? Currently metrics are required, which is why I added this on the constructor.
I can make them optional, it'll be if let Some(metrics) = self.metrics { metrics.foo.inc(); }
then at each place where metrics are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to make them optional, just construct them as default
if none are passed in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed that to use the builder method instead of Default in the constructor.
/// | ||
/// See [`crate::endpoint::Endpoint::metrics`] for details. | ||
#[derive(Default, Debug, Clone)] | ||
pub struct EndpointMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should mark all the metrics structs as non_exhaustive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -285,7 +288,9 @@ impl ActiveRelayActor { | |||
/// | |||
/// Primarily switches between the dialing and connected states. | |||
async fn run(mut self) -> Result<()> { | |||
inc!(MagicsockMetrics, num_relay_conns_added); | |||
// TODO(frando): decide what this metric means, it's either wrong here or in node_state.rs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make an issue for this, otherwise I fear this will get lost very quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// we potentially have a relay connection to the node | ||
inc!(MagicsockMetrics, num_relay_conns_added); | ||
} | ||
// TODO(frando): I don't think we need to track the `num_relay_conns_added` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Description Needed for n0-computer/iroh#3262 Updates `iroh-metrics` to 0.34, and makes the metrics collection non-global. Metrics are now collected per portmapper service. ## Breaking Changes * `portmapper::metrics::Metrics` now implements `MetricsGroup` from `[email protected]`, and no longer implements `Metric` from `[email protected]`. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented.
## Description Adapts iroh-metrics for n0-computer/iroh-metrics#15 and n0-computer/iroh#3262 Depends on n0-computer/iroh#3262 ## Breaking Changes * `metrics::Metrics` now implements `MetricsGroup` from the ì[email protected]` * Metrics are no longer tracked into the `static_core` from `iroh-metrics`, but instead are tracked per `Gossip` and exposed via `Gossip::metrics` * `proto::state::State::handle` now takes `Option<&Metrics>` as new 4th parameter ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented.
…ics collection (#85) ## Description Updates metrics tracking to the new non-global tracking. Metrics are tracked per downloader. We already only track metrics within the downloader, so this was rather straightforward. The tracking of request stats was changed to be tracked once the download task completes, as this seemed easier to me. Needs a careful review, but I think the change is sound (i.e. no change). Depends on n0-computer/iroh#3262 ## Breaking Changes * `metrics::Metrics` now implements `MetricsGroup` from the recent ìroh-metrics` release (TODO: fill in version after release) * Metrics are no longer tracked into the `static_core` from `iroh-metrics`, but instead are tracked per `Downloder` and exposed via `Downloader::metrics` ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented. --------- Co-authored-by: Ruediger Klaehn <[email protected]>
## Description Updates metrics tracking to the new non-global tracking. Metrics are tracked per sync actor. We previously tracked a few metrics within the replica, this is removed and instead tracked from the sync actor now. Depends on n0-computer/iroh#3262 Depends on n0-computer/iroh-blobs#85 Depends on n0-computer/iroh-gossip#58 ## Breaking Changes * `metrics::Metrics` now implements `MetricsGroup` from the recent ìroh-metrics` release (TODO: fill in version after release) * Metrics are no longer tracked into the `static_core` from `iroh-metrics`, but instead are tracked per `Engine` and exposed via `Engine::metrics` ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented.
Description
Depends on n0-computer/net-tools#20
Until now, we were using a superglobal static
iroh_metrics::core::Core
struct to collect metrics into. This allowed us to use macros to track metrics from anywhere in the codebase. However, this also made it impossible to collect metrics per endpoint, which is what you want usually as soon as you have more than one endpoint in your app.This PR builds on n0-computer/iroh-metrics#15, n0-computer/iroh-metrics#22, and n0-computer/iroh-metrics#23. It removes the global metrics collection from all crates in the iroh repository. Instead, we now create and pass metrics collector structs to all places where we need to collect metrics.
This PR disables the
static_core
feature from iroh-metrics, which means the macros for superglobal metrics collection are not available anymore. This is good, because otherwise we could easily miss metrics not tracked onto the proper metrics collector.This PR also updates iroh-dns-server and iroh-relay to use manual metrics collection.
While this means that we have to pass our metrics structs to more places, it also makes metrics collection more visible, and we can now also split the metrics structs further easily if we want to separate concerns more.
This PR should not change anything apart from metrics collection. Most places are straightforward conversions from the macros to methods on the metrics collectors. At a few places, logic was changed slightly to move metrics collection a layer up to save a few clones.
Breaking Changes
iroh::metrics::{MagicsockMetrics, PortmapMetrics, NetReportMetrics}
) now implementMetricsGroup
from the new version0.34
ofiroh-metrics
and no longer implement traits from[email protected]
.Core
.iroh
does not usestatic_core
feature ofiroh-metrics
. Metrics are now exposed from the subsystems that track them, see e.g.Endpoint::metrics
.Several methods now take a
Metrics
argument. You can always passDefault::default
if you don't want to unify metrics tracking with other sections.iroh
iroh::metrics::{MagicsockMetrics, NetReportMetrics, PortmapMetrics}
all are now markednon_exhaustive
, and implementiroh_metrics::MetricsGroup
from[email protected]
and no longer implementiroh_metrics::Metric
from[email protected]
. They also no longer implementClone
(put them into anArc
for cloning instead).iroh::net_report::Client::new
now takesiroh::net_report::metrics::Metrics
as forth argumentiroh-dns-server
iroh_dns_server::server::Server::spawn
now takesMetrics
as third argumentiroh_dns_server::ZoneStore::persistent
now takesMetrics
as third argumentiroh_dns_server::ZoneStore::in_memory
now takesMetrics
as third argumentiroh_dns_server::ZoneStore::new
now takesMetrics
as third argumentiroh_dns_server::state::AppState
now has a publicmetrics: Metrics
fieldiroh_dns_server::dns::DnsHandler::new
now takesMetrics
as third argumentiroh_dns_server::metrics::init_metrics
is removediroh-relay
iroh_relay::metrics::{StunMetrics, Metrics}
all are now markednon_exhaustive
, and implementiroh_metrics::MetricsGroup
from[email protected]
and no longer implementiroh_metrics::Metric
from[email protected]
. They also no longer implementClone
(put them into anArc
for cloning instead).Notes & open questions
Change checklist
iroh-gossip
iroh-blobs
iroh-docs