Add targeted audit policy rules that suppress high-frequency, low-value
requests which were generating ~570k audit events per 10 hours and
causing kube-apiserver to consume 260-316m CPU per node.
Suppressed categories (no security impact):
- coordination.k8s.io/leases: controller/node heartbeats (86k GET + 46k PUT/10h)
- /healthz*, /readyz*, /livez*, /openapi*, /version: probe & discovery endpoints
- system:nodes user group: kubelet node status updates
- endpoints + endpointslices GET/LIST/WATCH: Cilium/CoreDNS polling
All other requests continue to be logged at Metadata level.
Result: 76% of audit events suppressed, non-leader apiserver CPU dropped
~50-60% (316m -> 125m on standby nodes). Policy lives in the patch file
so it survives cluster resets via talhelper genconfig.
Default Rook requests (mon=1100m, mgr=700m, CSI sidecars=250-650m)
were consuming 17,860m across an 11,850m cluster, causing ESXi CPU
overcommit stalls that broke kube-apiserver connectivity and lost
leader elections in kube-controller-manager/cilium-operator/openebs.
New values target ~2,500m total Rook CPU requests:
- mon: 200m (was 1100m)
- mgr: 100m (was 700m)
- mds: 100m (was ~500m)
- osd: 200m (was no request, 8Gi memory limit)
- CSI sidecars: 10-50m each (was 100-250m each)