Skip to content

Commit 2f554a2

Browse files
authored
Update NGF documentation on prometheus metrics (#249)
Update documentation on prometheus metrics. Problem: Because NGF now uses NGINX Agent to export NGINX metrics, we need to update our documentation on metrics available and the example grafana dashboard. Solution: Update the metrics. * Add feedback
1 parent a396bc8 commit 2f554a2

File tree

2 files changed

+244
-321
lines changed

2 files changed

+244
-321
lines changed

content/ngf/how-to/monitoring/prometheus.md

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,23 +83,54 @@ NGINX Gateway Fabric provides a variety of metrics for monitoring and analyzing
8383

8484
### NGINX/NGINX Plus metrics
8585

86-
NGINX metrics cover specific NGINX operations such as the total number of accepted client connections. For a complete list of available NGINX/NGINX Plus metrics, refer to the [NGINX Prometheus Exporter developer docs](https://github.com/nginx/nginx-prometheus-exporter#exported-metrics).
87-
88-
These metrics use the `nginx_gateway_fabric` namespace and include the `class` label, indicating the NGINX Gateway class. For example, `nginx_gateway_fabric_connections_accepted{class="nginx"}`.
86+
NGINX metrics include NGINX-specific data such as the total number of accepted client connections. These metrics are
87+
collected through NGINX Agent and are reported by each NGINX Pod.
88+
89+
NGINX Gateway Fabric currently supports a subset of all metrics available through NGINX OSS and Plus. Listed below are
90+
the supported metrics along with a small accompanying description.
91+
92+
Metrics provided by NGINX Open Source include:
93+
- `nginx_http_connections`: NGINX-wide statistics describing HTTP connections.
94+
- `nginx_http_requests`: The total number of client requests received from clients.
95+
96+
In addition to the previous metrics provided by NGINX Open Source, NGINX Plus includes:
97+
- `nginx_config_reloads`: The total number of NGINX config reloads.
98+
- `nginx_http_response_status_responses_total`: The number of responses, grouped by status code range.
99+
- `nginx_http_request_discarded_requests_total`: The total number of requests completed without sending a response.
100+
- `nginx_http_request_processing_count_requests`: The number of client requests that are currently being processed.
101+
- `nginx_http_request_byte_io_bytes_total`: The total number of HTTP byte IO.
102+
- `nginx_http_upstream_keepalive_count_connections`: The current number of idle keepalive connections per HTTP upstream.
103+
- `nginx_http_upstream_peer_byte_io_bytes_total`: The total number of byte IO per HTTP upstream peer.
104+
- `nginx_http_upstream_peer_count_peers`: The current count of peers on the HTTP upstream grouped by state.
105+
- `nginx_http_upstream_peer_fails_attempts`: The total number of unsuccessful attempts to communicate with the HTTP upstream peer.
106+
- `nginx_http_upstream_peer_header_time_milliseconds`: The average time to get the response header from the HTTP upstream peer.
107+
- `nginx_http_upstream_peer_health_checks_requests_total`: The total number of health check requests made to a HTTP upstream peer.
108+
- `nginx_http_upstream_peer_requests_total`: The total number of client requests forwarded to the HTTP upstream peer.
109+
- `nginx_http_upstream_peer_response_time_milliseconds`: The average time to get the full response from the HTTP upstream peer.
110+
- `nginx_http_upstream_peer_responses_total`: The total number of responses obtained from the HTTP upstream peer grouped by status range.
111+
- `nginx_http_upstream_peer_state_is_deployed`: Current state of an upstream peer in deployment.
112+
- `nginx_http_upstream_peer_unavailables_requests_total`: Number of times the server became unavailable for client requests (“unavail”).
113+
- `nginx_http_upstream_queue_limit_requests`: The maximum number of requests that can be in the queue at the same time.
114+
- `nginx_http_upstream_queue_overflows_responses_total`: The total number of requests rejected due to the queue overflow.
115+
- `nginx_http_upstream_queue_usage_requests`: The current number of requests in the queue.
116+
- `nginx_http_upstream_zombie_count_is_deployed`: The current number of upstream peers removed from the group but still processing active client requests.
117+
- `nginx_slab_page_free_pages`: The current number of free memory pages.
118+
- `nginx_slab_page_usage_pages`: The current number of used memory pages.
119+
- `nginx_slab_slot_allocations_total`: The number of attempts to allocate memory of specified size.
120+
- `nginx_slab_slot_free_slots`: The current number of free memory slots.
121+
- `nginx_slab_slot_usage_slots`: The current number of used memory slots.
122+
- `nginx_ssl_certificate_verify_failures_certificates_total`: The total number of SSL certificate verification failures.
123+
- `nginx_ssl_handshakes_total`: The total number of SSL handshakes.
89124

90125
---
91126

92127
### NGINX Gateway Fabric metrics
93128

94129
Metrics specific to NGINX Gateway Fabric include:
95130

96-
- `nginx_reloads_total`: Counts successful NGINX reloads.
97-
- `nginx_reload_errors_total`: Counts NGINX reload failures.
98-
- `nginx_stale_config`: Indicates if NGINX Gateway Fabric couldn't update NGINX with the latest configuration, resulting in a stale version.
99-
- `nginx_reloads_milliseconds`: Time in milliseconds for NGINX reloads.
100131
- `event_batch_processing_milliseconds`: Time in milliseconds to process batches of Kubernetes events.
101132

102-
All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the Gateway class of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_nginx_reloads_total{class="nginx"}`.
133+
All these metrics are under the `nginx_gateway_fabric` namespace and include a `class` label set to the GatewayClass of NGINX Gateway Fabric. For example, `nginx_gateway_fabric_event_batch_processing_milliseconds_sum{class="nginx"}`.
103134

104135
---
105136

0 commit comments

Comments
 (0)