-
Notifications
You must be signed in to change notification settings - Fork 1.2k
⚠️ Disable the rest_client_request_latency_seconds
metric by default
#1587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚠️ Disable the rest_client_request_latency_seconds
metric by default
#1587
Conversation
Hi @2uasimojo. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this? this isn't really making it optional.. you should have a command line flag which allows you to enable this metric and plumb the value of that flag to the metric registration function
No. We are a library. We are not going to register a flag. How and if controller authors expose options in their controllers to end-users is up to them. |
component-base is also a library, it includes flags which can be used by consuming binaries. these include the ones for metrics. |
...but I'm with you on the plumbing part. If there was a way we could set a package variable such that it's present by the time the import |
yes it is possible, we literally do this in upstream kubernetes. |
obviously this means you can't use the init for these though. |
Okay; I was looking for a way to enable opt-out so the change could be non-breaking. But if
then I'm not sure how that would work. |
Kubelet is a binary, we are a library. I am going to veto any change that registers a flag. The one case where we do this creates enough trouble already: #1396 |
I think it's okay to make it opt-in, it's more friendly than just deleting it. And the approach of manually invoking Register is now an option. |
That's fine, I don't have any skin in the game here, I am just saying that it's not friendly to outright break people. What you guys want to do with your library and what your priorities are 🤷 to me.. I'm stating my opinion about deleting metrics and breaking people.. I've had it happen to me and it sucks. |
Actually, as a library, it would make sense to make all your metrics opt-in... |
So if we don't
and we want to
then is the approach in this PR acceptable? Or would it be better to wrap the registration of this metric in its own Alternatively, I would be happy to propose
|
Yeah I would probably make all the metrics public and auto-register them for a release and have a release note saying that in the next release, these will all be opt in. That way, people have some sort of warning, rather than just randomly have everything break. You have several options (as I see it):
|
I'm happy to do anything in this range, from outright removal to immediate opt-in (this PR) to a multi-release no-change-deprecation => opt-in dance. @alvaroaleman what is your wish? |
This change modulo my comment https://github.com/kubernetes-sigs/controller-runtime/pull/1587/files#r667037223 is IMHO fine. I don't think we should require opt-in for every single metric but I am happy to discuss if, which and how we register metrics by default in a new issue. |
This metric has been deprecated since kube 1.14 and [turned off in kube 1.17](kubernetes/kubernetes#83836). Stop registering it by default, but export the appropriate variables so consumers can register it explicitly if they wish.
✓
See #1589 for one approach: opt in gets you all by default, but you have the ability to opt out of individual metrics. |
I am going to retitle this to make sure that its properly understood when it ends up in a release note: |
rest_client_request_latency_seconds
metric optionalrest_client_request_latency_seconds
metric by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo, alvaroaleman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Okay, so what's the process for getting this into a tagged release? |
@2uasimojo we deliver breaking changes with a new minor release and do a new minor release whenever there is a new kube minor release. So this will take some time. |
We were using a [controller-runtime fork](github.com/openshift-hive/controller-runtime) so we could [disable a metric](openshift-hive/controller-runtime@e847e4b) that was causing a cardinality explosion and memory problems. Upstream controller-runtime has now [disabled that metric by default](kubernetes-sigs/controller-runtime#1587); so this commit switches back to using upstream. NOTE: We're using an unreleased upstream version.
We were using a [controller-runtime fork](github.com/openshift-hive/controller-runtime) so we could [disable a metric](openshift-hive/controller-runtime@e847e4b) that was causing a cardinality explosion and memory problems. Upstream controller-runtime has now [disabled that metric by default](kubernetes-sigs/controller-runtime#1587); so this commit switches back to using upstream. NOTE: We're using an unreleased upstream version.
We were using a [controller-runtime fork](github.com/openshift-hive/controller-runtime) so we could [disable a metric](openshift-hive/controller-runtime@e847e4b) that was causing a cardinality explosion and memory problems. Upstream controller-runtime has now [disabled that metric by default](kubernetes-sigs/controller-runtime#1587); so this commit switches back to using upstream. NOTE: We're using an unreleased upstream version.
* adapt to metrics name change from k8s 1.17 see kubernetes-sigs/controller-runtime#1587 * remove dependency on github.com/grafana/grafana just for the RoleType * update to k8s 1.22.1 and controller-runtime 0.10 * update codegen
This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) trics that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).
This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) trics that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).
This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) trics that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).
This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).
This metric has been deprecated since kube 1.14 and turned off in kube 1.17. Stop registering it by default, but export the appropriate variables so consumers can register it explicitly if they wish.
fixes #1423