Don't not send any telemetry data if telemetry collection fails

**Describe the bug**
When NGF fails to collect product telemetry, it sends empty telemetry data.

**To Reproduce**

```
cd tests
```

```
make create-kind-cluster
```

```
make build-images load-images TAG=$(whoami) TELEMETRY_ENDPOINT=otel-collector-opentelemetry-collector.collector.svc.cluster.local:4317 TELEMETRY_ENDPOINT_INSECURE=true
```

```
helm install otel-collector open-telemetry/opentelemetry-collector -f suite/manifests/telemetry/collector-values.yaml -n collector --create-namespace
```

Deploy NGF:
```
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml

cd ..

helm install my-release ./deploy/helm-chart --create-namespace --wait --set service.type=NodePort --set nginxGateway.image.repository=nginx-gateway-fabric --set nginxGateway.image.tag=$(whoami) --set nginxGateway.image.pullPolicy=Never --set nginx.image.repository=nginx-gateway-fabric/nginx --set nginx.image.tag=$(whoami) --set nginx.image.pullPolicy=Never -n nginx-gateway
```

Edit NGF cluster role - remove rbac to list nodes:
```
kubectl edit clusterrole  my-release-nginx-gateway-fabric
```

Remove:
```
 44 - apiGroups:
 45   - ""
 46   resources:
 47   - nodes
 48   verbs:
 49   - list
```

Delete NGF pod to re-create a new one:

```
kubectl -n nginx-gateway delete pod <pod-name>
```

Look at NGF pod logs, it should fail to collect telemetry because of RBAC changes:
```
{"level":"error","ts":"2024-03-19T21:21:12Z","logger":"telemetryJob","msg":"Failed to collect telemetry data","error":"failed to collect cluster information: failed to get NodeList: nodes is forbidden: User \"system:serviceaccount:nginx-gateway:my-release-nginx-gateway-fabric\" cannot list resource \"nodes\" in API group \"\" at the cluster scope","stacktrace":"github.com/nginxinc/nginx-gateway-fabric/internal/mode/static.createTelemetryJob.CreateTelemetryJobWorker.func4\n\tgithub.com/nginxinc/nginx-gateway-fabric/internal/mode/static/telemetry/job_worker.go:29\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\tk8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:259\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\tk8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:259\ngithub.com/nginxinc/nginx-gateway-fabric/internal/framework/runnables.(*CronJob).Start\n\tgithub.com/nginxinc/nginx-gateway-fabric/internal/framework/runnables/cronjob.go:53\nsigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1\n\tsigs.k8s.io/controller-runtime@v0.17.2/pkg/manager/runnable_group.go:223"}
```

Look at the collector logs:
```
kubectl -n collector logs <otel-collector-pod-name> | grep "dataType: Str(ngf-product-telemetry)" -A 19
     -> dataType: Str(ngf-product-telemetry)
     -> ImageSource: Str(local)
     -> ProjectName: Str(NGF)
     -> ProjectVersion: Str(edge)
     -> ProjectArchitecture: Str(amd64)
     -> ClusterID: Str(ced72774-ef05-403c-9a91-2acffc9c386f)
     -> ClusterVersion: Str(1.29.2)
     -> ClusterPlatform: Str(kind)
     -> InstallationID: Str(43a0a1be-919c-417b-b85e-782adb1e3f39)
     -> ClusterNodeCount: Int(1)
     -> FlagNames: Slice(["config","gateway","gateway-api-experimental-features","gateway-ctlr-name","gatewayclass","health-disable","health-port","help","leader-election-disable","leader-election-lock-name","metrics-disable","metrics-port","metrics-secure-serving","nginx-plus","product-telemetry-disable","service","update-gatewayclass-status","usage-report-cluster-name","usage-report-secret","usage-report-server-url","usage-report-skip-verify"])
     -> FlagValues: Slice(["user-defined","default","false","user-defined","user-defined","false","default","false","false","user-defined","false","default","false","false","false","user-defined","true","default","default","default","false"])
     -> GatewayCount: Int(0)
     -> GatewayClassCount: Int(1)
     -> HTTPRouteCount: Int(0)
     -> SecretCount: Int(0)
     -> ServiceCount: Int(0)
     -> EndpointCount: Int(0)
     -> NGFReplicaCount: Int(1)
        {"kind": "exporter", "data_type": "traces", "name": "debug"}
--
     -> dataType: Str(ngf-product-telemetry)
     -> ImageSource: Str()
     -> ProjectName: Str()
     -> ProjectVersion: Str()
     -> ProjectArchitecture: Str()
     -> ClusterID: Str()
     -> ClusterVersion: Str()
     -> ClusterPlatform: Str()
     -> InstallationID: Str()
     -> ClusterNodeCount: Int(0)
     -> FlagNames: Slice([])
     -> FlagValues: Slice([])
     -> GatewayCount: Int(0)
     -> GatewayClassCount: Int(0)
     -> HTTPRouteCount: Int(0)
     -> SecretCount: Int(0)
     -> ServiceCount: Int(0)
     -> EndpointCount: Int(0)
     -> NGFReplicaCount: Int(0)
        {"kind": "exporter", "data_type": "traces", "name": "debug"}
```


Note how the second report from the new pod sends empty data. 

**Expected behavior**

In case of error on collection, NGF should send any telemetry.

**Your environment**
* NGF Edge version

**Additional context**
Add any other context about the problem here. Any log files you want to share.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't not send any telemetry data if telemetry collection fails #1729

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Don't not send any telemetry data if telemetry collection fails #1729

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions