|
1 | 1 | # Enhancement Proposal-929: Data Plane Dynamic Configuration
|
2 | 2 |
|
3 | 3 | - Issue: https://github.com/nginxinc/nginx-kubernetes-gateway/issues/929
|
4 |
| -- Status: Provisional |
| 4 | +- Status: Implementable |
5 | 5 |
|
6 | 6 | ## Summary
|
7 | 7 |
|
8 |
| -This proposal is intended to contain the design for how to dynamically configure the data plane for the |
9 |
| -NGINX Gateway Fabric (NGF) project. Similar to control plane configuration, we should be able to leverage |
| 8 | +This proposal is intended to contain the design for how to dynamically configure global settings for the data plane |
| 9 | +of the NGINX Gateway Fabric (NGF) product. Similar to control plane configuration, we should be able to leverage |
10 | 10 | a custom resource definition to define data plane configuration, considering fields such as telemetry and
|
11 | 11 | upstream zone size.
|
12 | 12 |
|
13 | 13 | ## Goals
|
14 | 14 |
|
15 |
| -Define a CRD to dynamically configure various settings for the NGF data plane. The initial configurable options |
16 |
| -will be for telemetry (tracing) and upstream zone size. |
| 15 | +Define a CRD to dynamically configure various global settings for the NGF data plane. The initial configurable |
| 16 | +options will be for telemetry (tracing) and upstream zone size. |
17 | 17 |
|
18 | 18 | ## Non-Goals
|
19 | 19 |
|
20 | 20 | 1. This proposal is not defining every setting that needs to be present in the configuration.
|
21 | 21 | 2. This proposal is not for any configuration related to control plane.
|
| 22 | + |
| 23 | +## Introduction |
| 24 | + |
| 25 | +The NGF data plane will evolve to have various user-configurable options. These could include, but are not |
| 26 | +limited to, tracing, logging, or metrics. For the best user experience, these options should be able to be |
| 27 | +changed at runtime, to avoid having to restart NGF. The first set of options that we will allow users to |
| 28 | +configure are tracing and upstream zone size. The easiest and most intuitive way to implement a Kubernetes-native |
| 29 | +API is through a CRD. |
| 30 | + |
| 31 | +The purpose of this CRD is to contain "global" configuration options for the data plane, and not focused on policy |
| 32 | +per route or backend. |
| 33 | + |
| 34 | +In this doc, the term "user" will refer to the cluster operator (the person who installs and manages NGF). The |
| 35 | +cluster operator owns this CRD resource. |
| 36 | + |
| 37 | +## API, Customer Driven Interfaces, and User Experience |
| 38 | + |
| 39 | +The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get` |
| 40 | +or `edit` the configuration. |
| 41 | + |
| 42 | +Proposed configuration CRD example: |
| 43 | + |
| 44 | +```yaml |
| 45 | +apiVersion: gateway.nginx.org/v1alpha1 |
| 46 | +kind: NginxProxy |
| 47 | +metadata: |
| 48 | + name: nginx-proxy-config |
| 49 | + namespace: nginx-gateway |
| 50 | +spec: |
| 51 | + telemetry: |
| 52 | + exporters: |
| 53 | + otlp: |
| 54 | + endpoint: my-otel-collector.svc:4317 |
| 55 | + interval: 5s |
| 56 | + batchSize: 512 |
| 57 | + batchCount: 4 |
| 58 | + upstreamZoneSize: 1024k |
| 59 | +status: |
| 60 | + conditions: |
| 61 | + ... |
| 62 | +``` |
| 63 | + |
| 64 | +- The CRD would be Namespace-scoped. |
| 65 | +- CRD is initialized and created when NGF is deployed, in the `nginx-gateway` Namespace. |
| 66 | +- CRD would be referenced in the [ParametersReference][ref] |
| 67 | +of the NGF GatewayClass. |
| 68 | + |
| 69 | +[ref]:https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.ParametersReference |
| 70 | + |
| 71 | +## Use Cases |
| 72 | + |
| 73 | +The high level use case for dynamically changing settings in the NGF data plane is to allow users to alter |
| 74 | +behavior without the need for restarting NGF and experiencing downtime. |
| 75 | + |
| 76 | +### Tracing |
| 77 | + |
| 78 | +Users may want to observe how traffic is flowing through their applications. Tracing is a great way to achieve |
| 79 | +this. By taking advantage of the OpenTelemetry standards, a user can deploy any OTLP-compliant tracing collector |
| 80 | +to receive and visualize tracing data. Allowing a user to configure a tracing backend for NGF will forward |
| 81 | +nginx tracing data to that backend for visualization. |
| 82 | + |
| 83 | +For future considerations, a user may want to disable tracing for certain routes (or only enable it for certain |
| 84 | +routes), in order to reduce the amount of data being collected. We would likely be able to implement a [per-route |
| 85 | +Policy](https://gateway-api.sigs.k8s.io/geps/gep-713/#direct-policy-attachment) |
| 86 | +that would include this switch. The proposed "global" CRD in this document would remain unchanged, though |
| 87 | +could include an additional field to enable or disable tracing globally. |
| 88 | + |
| 89 | +### Upstream Zone Size |
| 90 | + |
| 91 | +As the number of servers within an upstream increases (in other words, Pod replicas for a Service), the |
| 92 | +shared memory zone size needs to increase to accomodate this. A user can fine-tune this number to fit their |
| 93 | +environment. |
| 94 | + |
| 95 | +## Testing |
| 96 | + |
| 97 | +Unit tests can be leveraged for verifying that NGF properly watches and acts on CRD changes. These tests would |
| 98 | +be similar in behavior as the current unit tests that verify the control plane CRD resource processing. |
| 99 | + |
| 100 | +We would need system level tests to ensure that tracing works as expected. |
| 101 | + |
| 102 | +## Security Considerations |
| 103 | + |
| 104 | +We need to ensure that any configurable fields that are exposed to a user are: |
| 105 | + |
| 106 | +- Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate |
| 107 | +length, and use regex patterns or enums to prevent any unwanted input. This will initially be done through |
| 108 | +OpenAPI schema validation. If necessary as the CRD evolves, CEL or webhooks could be used. |
| 109 | +- Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing |
| 110 | +fields that are genuinely useful for a user to change dynamically. |
| 111 | + |
| 112 | +RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing NGF data |
| 113 | +plane configuration. |
| 114 | + |
| 115 | +## Alternatives |
| 116 | + |
| 117 | +- ConfigMap |
| 118 | +A ConfigMap is another type of resource that a user can provide configuration options within, however it lacks the |
| 119 | +benefits of a CRD, specifically built-in schema validation, versioning, and conversion webhooks. |
| 120 | + |
| 121 | +- Custom API server |
| 122 | +The NGF control plane could implement its own custom API server. However the overhead of implementing this, which |
| 123 | +would include auth, validation, endpoints, and so on, would not be worth it due to the fact that the Kubernetes |
| 124 | +API server already does all of these things for us. |
| 125 | + |
| 126 | +## References |
| 127 | + |
| 128 | +- [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) |
0 commit comments