Skip to content

Commit f97792f

Browse files
committed
Proposal (Implementable): data plane config
Update enhancement proposal with implementable details for NGF data plane dynamic configuration.
1 parent f547884 commit f97792f

File tree

1 file changed

+112
-5
lines changed

1 file changed

+112
-5
lines changed

docs/proposals/data-plane-config.md

Lines changed: 112 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,128 @@
11
# Enhancement Proposal-929: Data Plane Dynamic Configuration
22

33
- Issue: https://github.com/nginxinc/nginx-kubernetes-gateway/issues/929
4-
- Status: Provisional
4+
- Status: Implementable
55

66
## Summary
77

8-
This proposal is intended to contain the design for how to dynamically configure the data plane for the
9-
NGINX Gateway Fabric (NGF) project. Similar to control plane configuration, we should be able to leverage
8+
This proposal is intended to contain the design for how to dynamically configure global settings for the data plane
9+
of the NGINX Gateway Fabric (NGF) product. Similar to control plane configuration, we should be able to leverage
1010
a custom resource definition to define data plane configuration, considering fields such as telemetry and
1111
upstream zone size.
1212

1313
## Goals
1414

15-
Define a CRD to dynamically configure various settings for the NGF data plane. The initial configurable options
16-
will be for telemetry (tracing) and upstream zone size.
15+
Define a CRD to dynamically configure various global settings for the NGF data plane. The initial configurable
16+
options will be for telemetry (tracing) and upstream zone size.
1717

1818
## Non-Goals
1919

2020
1. This proposal is not defining every setting that needs to be present in the configuration.
2121
2. This proposal is not for any configuration related to control plane.
22+
23+
## Introduction
24+
25+
The NGF data plane will evolve to have various user-configurable options. These could include, but are not
26+
limited to, tracing, logging, or metrics. For the best user experience, these options should be able to be
27+
changed at runtime, to avoid having to restart NGF. The first set of options that we will allow users to
28+
configure are tracing and upstream zone size. The easiest and most intuitive way to implement a Kubernetes-native
29+
API is through a CRD.
30+
31+
The purpose of this CRD is to contain "global" configuration options for the data plane, and not focused on policy
32+
per route or backend.
33+
34+
In this doc, the term "user" will refer to the cluster operator (the person who installs and manages NGF). The
35+
cluster operator owns this CRD resource.
36+
37+
## API, Customer Driven Interfaces, and User Experience
38+
39+
The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get`
40+
or `edit` the configuration.
41+
42+
Proposed configuration CRD example:
43+
44+
```yaml
45+
apiVersion: gateway.nginx.org/v1alpha1
46+
kind: NginxProxy
47+
metadata:
48+
name: nginx-proxy-config
49+
namespace: nginx-gateway
50+
spec:
51+
telemetry:
52+
exporters:
53+
otlp:
54+
endpoint: my-otel-collector.svc:4317
55+
interval: 5s
56+
batchSize: 512
57+
batchCount: 4
58+
upstreamZoneSize: 1024k
59+
status:
60+
conditions:
61+
...
62+
```
63+
64+
- The CRD would be Namespace-scoped.
65+
- CRD is initialized and created when NGF is deployed, in the `nginx-gateway` Namespace.
66+
- CRD would be referenced in the [ParametersReference][ref]
67+
of the NGF GatewayClass.
68+
69+
[ref]:https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.ParametersReference
70+
71+
## Use Cases
72+
73+
The high level use case for dynamically changing settings in the NGF data plane is to allow users to alter
74+
behavior without the need for restarting NGF and experiencing downtime.
75+
76+
### Tracing
77+
78+
Users may want to observe how traffic is flowing through their applications. Tracing is a great way to achieve
79+
this. By taking advantage of the OpenTelemetry standards, a user can deploy any OTLP-compliant tracing collector
80+
to receive and visualize tracing data. Allowing a user to configure a tracing backend for NGF will forward
81+
nginx tracing data to that backend for visualization.
82+
83+
For future considerations, a user may want to disable tracing for certain routes (or only enable it for certain
84+
routes), in order to reduce the amount of data being collected. We would likely be able to implement a [per-route
85+
Policy](https://gateway-api.sigs.k8s.io/geps/gep-713/#direct-policy-attachment)
86+
that would include this switch. The proposed "global" CRD in this document would remain unchanged, though
87+
could include an additional field to enable or disable tracing globally.
88+
89+
### Upstream Zone Size
90+
91+
As the number of servers within an upstream increases (in other words, Pod replicas for a Service), the
92+
shared memory zone size needs to increase to accomodate this. A user can fine-tune this number to fit their
93+
environment.
94+
95+
## Testing
96+
97+
Unit tests can be leveraged for verifying that NGF properly watches and acts on CRD changes. These tests would
98+
be similar in behavior as the current unit tests that verify the control plane CRD resource processing.
99+
100+
We would need system level tests to ensure that tracing works as expected.
101+
102+
## Security Considerations
103+
104+
We need to ensure that any configurable fields that are exposed to a user are:
105+
106+
- Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate
107+
length, and use regex patterns or enums to prevent any unwanted input. This will initially be done through
108+
OpenAPI schema validation. If necessary as the CRD evolves, CEL or webhooks could be used.
109+
- Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing
110+
fields that are genuinely useful for a user to change dynamically.
111+
112+
RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing NGF data
113+
plane configuration.
114+
115+
## Alternatives
116+
117+
- ConfigMap
118+
A ConfigMap is another type of resource that a user can provide configuration options within, however it lacks the
119+
benefits of a CRD, specifically built-in schema validation, versioning, and conversion webhooks.
120+
121+
- Custom API server
122+
The NGF control plane could implement its own custom API server. However the overhead of implementing this, which
123+
would include auth, validation, endpoints, and so on, would not be worth it due to the fact that the Kubernetes
124+
API server already does all of these things for us.
125+
126+
## References
127+
128+
- [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)

0 commit comments

Comments
 (0)