Skip to content

Proposal: Expose all client-go metrics by default #3202

Open
@ahmetb

Description

@ahmetb

Summary

Expose more client-side metrics offered by client-go in the controller process by default, similar to how Kubernetes builtin controllers/apiserver does

Time and time again, lack of these metrics exposed our internal controllers has prevented us from being able to monitor how long we're getting stuck in the client-side rate limiter, or what is the observed latency of the REST client requests in the controller etc (without writing our own instrumented REST transport wrapper).

Details

client-go currently exposes the following hooks that a metrics collector can register to https://github.com/kubernetes/client-go/blob/v0.33.0/tools/metrics/metrics.go#L114-L127:

Metric Name Type Dimensions Description
rest_client_request_duration_seconds Histogram verb, host Request latency in seconds.

Buckets: [0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15.0, 30.0, 60.0]
rest_client_dns_resolution_duration_seconds Histogram host DNS resolver latency in seconds.

Buckets: [0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15.0, 30.0]
rest_client_request_size_bytes Histogram verb, host Request size in bytes.

Buckets: [64, 256, 512, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216]
rest_client_response_size_bytes Histogram verb, host Response size in bytes.

Buckets: [64, 256, 512, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216]
rest_client_rate_limiter_duration_seconds Histogram verb, host Client-side rate limiter latency in seconds.

Buckets: [0.005, 0.025, 0.1, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15.0, 30.0, 60.0]
rest_client_requests_total Counter code, method, host Number of HTTP requests.
rest_client_request_retries_total Counter code, verb, host Number of request retries.
rest_client_transport_cache_entries Gauge (none) Number of transport entries in the internal cache.
rest_client_transport_create_calls_total Counter result Number of calls to get a new transport, partitioned by the result of the operation.

Among these, the only metric currently exposed with controller-runtime is rest_client_requests_total. Some other metrics were previously removed (#1587) due to unbounded dimension cardinality; however, with recent overhauls to the metrics, the highest cardinality we get is the host dimension (which is presumably just however many apiserver host:ports you have).

Proposal

  1. controller-runtime starts exposing all of the listed metrics (by copying them from k8s.io/component-base) in controller-runtime by default.

  2. Existing rest_client_requests_total metric should remain unmodified.

  3. ExecPluginCalls hook (i.e. rest_client_exec_plugin_call_total metric) should be left out as it is very rarely if ever useful for a controller process.

Considerations

  1. Stability: ALL of the metrics listed above are listed in ALPHA stage in component-base and in k8s.io Metrics Documentation, presumably for components like kube-scheduler, kube-controller-manager etc. Do we also offer them as stable? Or do we break users later?

  2. Cardinality: Some histogram metrics have 10-12 buckets. In a large cluster setup with 10 apiservers x 4 verbs, it can easily reach 400+ time series per metric (still bounded though).

  3. Future improvements: Client-go offers a url value in one of the hook functions. This url is actually a value that's free of resource {namespace,name} (i.e. it's bounded cardinality for us!) but is available only in one metric hook😢. component-base basically uses that url.URL value to find the host label.

    However, if client-go some day starts providing url label for every metric, it would be even more useful, but we'd likely need to break the metrics.

/kind design
/cc @alvaroaleman

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/designCategorizes issue or PR as related to design.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions