⚠️ Disable the `rest_client_request_latency_seconds` metric by default #1587

2uasimojo · 2021-07-09T15:24:36Z

This metric has been deprecated since kube 1.14 and turned off in kube 1.17. Stop registering it by default, but export the appropriate variables so consumers can register it explicitly if they wish.

fixes #1423

k8s-ci-robot · 2021-07-09T15:24:44Z

Hi @2uasimojo. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

logicalhan · 2021-07-09T15:30:38Z

/ok-to-test

pkg/metrics/client_go_adapter.go

logicalhan

what is this? this isn't really making it optional.. you should have a command line flag which allows you to enable this metric and plumb the value of that flag to the metric registration function

alvaroaleman · 2021-07-09T15:33:48Z

what is this? this isn't really making it optional.. you should have a command line flag which allows you to enable this metric and plumb the value of that flag to the metric registration function

No. We are a library. We are not going to register a flag. How and if controller authors expose options in their controllers to end-users is up to them.

logicalhan · 2021-07-09T15:35:01Z

component-base is also a library, it includes flags which can be used by consuming binaries. these include the ones for metrics.

2uasimojo · 2021-07-09T15:35:37Z

...but I'm with you on the plumbing part. If there was a way we could set a package variable such that it's present by the time the import init()s, we could switch it on/off from client code. Is that even possible in go?

logicalhan · 2021-07-09T15:36:10Z

...but I'm with you on the plumbing part. If there was a way we could set a package variable such that it's present by the time the import init()s, we could switch it on/off from client code. Is that even possible in go?

yes it is possible, we literally do this in upstream kubernetes.

logicalhan · 2021-07-09T15:36:39Z

obviously this means you can't use the init for these though.

2uasimojo · 2021-07-09T15:38:40Z

Okay; I was looking for a way to enable opt-out so the change could be non-breaking. But if

you can't use the init for these

then I'm not sure how that would work.

logicalhan · 2021-07-09T15:38:49Z

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/metrics/metrics.go#L506-L546

...is one way.

alvaroaleman · 2021-07-09T15:40:26Z

Kubelet is a binary, we are a library. I am going to veto any change that registers a flag. The one case where we do this creates enough trouble already: #1396

logicalhan · 2021-07-09T15:40:39Z

Okay; I was looking for a way to enable opt-out so the change could be non-breaking.

I think it's okay to make it opt-in, it's more friendly than just deleting it. And the approach of manually invoking Register is now an option.

logicalhan · 2021-07-09T15:42:24Z

Kubelet is a binary, we are a library. I am going to veto any change that registers a flag. The one case where we do this creates enough trouble already: #1396

That's fine, I don't have any skin in the game here, I am just saying that it's not friendly to outright break people. What you guys want to do with your library and what your priorities are 🤷 to me.. I'm stating my opinion about deleting metrics and breaking people.. I've had it happen to me and it sucks.

logicalhan · 2021-07-09T15:43:34Z

Actually, as a library, it would make sense to make all your metrics opt-in...

2uasimojo · 2021-07-09T15:44:39Z

So if we don't

registers a flag

and we want to

make it opt-in

then is the approach in this PR acceptable? Or would it be better to wrap the registration of this metric in its own Register func?

Alternatively, I would be happy to propose

make all your metrics opt-in

logicalhan · 2021-07-09T15:48:10Z

Yeah I would probably make all the metrics public and auto-register them for a release and have a release note saying that in the next release, these will all be opt in. That way, people have some sort of warning, rather than just randomly have everything break. You have several options (as I see it):

expose all metrics individually (this is kinda a pain because now you have to know all the ones you want to register)
expose groups of metrics (this is slightly easier for consumers)
expose a single Register which registers all metrics (possibly taking a var args of metrics not to enable).

2uasimojo · 2021-07-09T15:53:22Z

I'm happy to do anything in this range, from outright removal to immediate opt-in (this PR) to a multi-release no-change-deprecation => opt-in dance.

@alvaroaleman what is your wish?

alvaroaleman · 2021-07-09T16:18:31Z

This change modulo my comment https://github.com/kubernetes-sigs/controller-runtime/pull/1587/files#r667037223 is IMHO fine. I don't think we should require opt-in for every single metric but I am happy to discuss if, which and how we register metrics by default in a new issue.

This metric has been deprecated since kube 1.14 and [turned off in kube 1.17](kubernetes/kubernetes#83836). Stop registering it by default, but export the appropriate variables so consumers can register it explicitly if they wish.

2uasimojo · 2021-07-09T16:51:08Z

This change modulo my comment

✓

I don't think we should require opt-in for every single metric but I am happy to discuss

See #1589 for one approach: opt in gets you all by default, but you have the ability to opt out of individual metrics.

alvaroaleman · 2021-07-09T16:51:52Z

I am going to retitle this to make sure that its properly understood when it ends up in a release note:
/retitle ⚠️ Disable the rest_client_request_latency_seconds metric by default

alvaroaleman

thanks!

k8s-ci-robot · 2021-07-09T16:52:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, alvaroaleman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alvaroaleman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

2uasimojo · 2021-07-09T19:00:37Z

Okay, so what's the process for getting this into a tagged release?

alvaroaleman · 2021-07-09T19:05:43Z

@2uasimojo we deliver breaking changes with a new minor release and do a new minor release whenever there is a new kube minor release. So this will take some time.

We were using a [controller-runtime fork](github.com/openshift-hive/controller-runtime) so we could [disable a metric](openshift-hive/controller-runtime@e847e4b) that was causing a cardinality explosion and memory problems. Upstream controller-runtime has now [disabled that metric by default](kubernetes-sigs/controller-runtime#1587); so this commit switches back to using upstream. NOTE: We're using an unreleased upstream version.

see kubernetes-sigs/controller-runtime#1587

* adapt to metrics name change from k8s 1.17 see kubernetes-sigs/controller-runtime#1587 * remove dependency on github.com/grafana/grafana just for the RoleType * update to k8s 1.22.1 and controller-runtime 0.10 * update codegen

This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) trics that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).

This commit adds two simple prometheus metrics to the http client that is being used by the backend; "requests_total" and "requests_duration_histogram_seconds". With that we should get some initial visibility into backend failures, response times and client requests per seconds as well. I decided to register everything in an `init` function to the `metrics.Gatherer`. Not perfect, but simple and probably good enough for a long time. I got to that `metrics.Gatherer` type by following the metrics-code of `controller-runtime`; I'm not sure if there's a better way to register metrics to that Registry, or if using a different registry would be fine as well and they'd simply get interlaced?... Additionally, controller-runtime also has a [`rest_client_requests_total` metric](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/metrics#pkg-constants) that it registers. As far as I can tell, this is for the default `http.Client` and comes from `client-go`. We could probably also make use of that, but would be missing a latency bucket. That latency bucket also exists, but is [disabled by default](kubernetes-sigs/controller-runtime#1587) because it created a cardinality explosion for some users, so I'm wary to enable it as well. By using a completely separate code-path and metrics-handler, we get metrics for only our backend, instead of them being interlaced with potential metrics from `client-go`. Additionally, we can start off with both latency and count-metrics, as I don't think we'll have issues with cardinality (we're only registering two labels - `client-go` also registered a "url" label which is not optimal).

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 9, 2021

k8s-ci-robot requested review from alenkacz and joelanford July 9, 2021 15:24

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 9, 2021

2uasimojo mentioned this pull request Jul 9, 2021

⚠️ Drop rest client request latecy metric #1586

Closed

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 9, 2021

alvaroaleman reviewed Jul 9, 2021

View reviewed changes

pkg/metrics/client_go_adapter.go Show resolved Hide resolved

logicalhan reviewed Jul 9, 2021

View reviewed changes

2uasimojo mentioned this pull request Jul 9, 2021

⚠️ Make all metrics opt-in #1589

Closed

k8s-ci-robot changed the title ~~⚠️ Make rest_client_request_latency_seconds metric optional~~ ⚠️ Disable the rest_client_request_latency_seconds metric by default Jul 9, 2021

alvaroaleman approved these changes Jul 9, 2021

View reviewed changes

k8s-ci-robot assigned alvaroaleman Jul 9, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 9, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2021

k8s-ci-robot merged commit 650ea59 into kubernetes-sigs:master Jul 9, 2021

k8s-ci-robot added this to the v0.10.x milestone Jul 9, 2021

2uasimojo deleted the issues/1423/optional branch July 9, 2021 18:40

2uasimojo mentioned this pull request Jul 9, 2021

Update controller-runtime to (unreleased) master openshift/hive#1451

Merged

alvaroaleman mentioned this pull request Jul 9, 2021

Bump controller-runtime openshift/ci-tools#2160

Merged

xrstf added a commit to xrstf/kubermatic that referenced this pull request Sep 8, 2021

adapt to metrics name change from k8s 1.17

cf3b883

see kubernetes-sigs/controller-runtime#1587

alvaroaleman mentioned this pull request Sep 30, 2021

Update controller-runtime to 0.10.1 openshift/hypershift#521

Merged

timebertt mentioned this pull request Oct 12, 2021

Upgrade k8s.io/* to v0.22, sigs.k8s.io/controller-runtime to v0.10 gardener/gardener#4772

Closed

6 tasks

yodaflomaster mentioned this pull request Nov 25, 2022

AWS load balancer controller continues to provide high cardinality unbounded metrics to prometheus endpoint kubernetes-sigs/aws-load-balancer-controller#2897

Closed

tommyknows mentioned this pull request Mar 30, 2023

[TIKI-91] feat: add metrics to backend client snyk/kubernetes-scanner#30

Closed

ahmetb mentioned this pull request Apr 29, 2025

Proposal: Expose all client-go metrics by default #3202

Open

⚠️ Disable the rest_client_request_latency_seconds metric by default #1587

⚠️ Disable the rest_client_request_latency_seconds metric by default #1587

Uh oh!

Conversation

2uasimojo commented Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

Uh oh!

logicalhan left a comment

Choose a reason for hiding this comment

Uh oh!

alvaroaleman commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

alvaroaleman commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

logicalhan commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

alvaroaleman commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

alvaroaleman commented Jul 9, 2021

Uh oh!

alvaroaleman left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jul 9, 2021

Uh oh!

2uasimojo commented Jul 9, 2021

Uh oh!

alvaroaleman commented Jul 9, 2021

Uh oh!

Uh oh!

⚠️ Disable the `rest_client_request_latency_seconds` metric by default #1587

⚠️ Disable the `rest_client_request_latency_seconds` metric by default #1587

2uasimojo commented Jul 9, 2021 •

edited

Loading