Files
talos-cluster/talos/patches/controller/cluster.yaml
Laur IVAN f7e635e3f1 talos: tune kube-apiserver audit policy to reduce CPU overhead
Add targeted audit policy rules that suppress high-frequency, low-value
requests which were generating ~570k audit events per 10 hours and
causing kube-apiserver to consume 260-316m CPU per node.

Suppressed categories (no security impact):
- coordination.k8s.io/leases: controller/node heartbeats (86k GET + 46k PUT/10h)
- /healthz*, /readyz*, /livez*, /openapi*, /version: probe & discovery endpoints
- system:nodes user group: kubelet node status updates
- endpoints + endpointslices GET/LIST/WATCH: Cilium/CoreDNS polling

All other requests continue to be logged at Metadata level.

Result: 76% of audit events suppressed, non-leader apiserver CPU dropped
~50-60% (316m -> 125m on standby nodes). Policy lives in the patch file
so it survives cluster resets via talhelper genconfig.
2026-02-25 11:56:36 +01:00

63 lines
2.1 KiB
YAML

---
cluster:
allowSchedulingOnControlPlanes: true
apiServer:
admissionControl:
$$patch: delete
extraArgs:
# https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/
enable-aggregator-routing: true
# Enable MutatingAdmissionPolicy feature gate (beta in K8s 1.35)
feature-gates: MutatingAdmissionPolicy=true
runtime-config: admissionregistration.k8s.io/v1beta1=true
auditPolicy:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Don't log lease heartbeats — these are high-frequency controller/node
# keepalives that generate the bulk of audit volume with no security value.
- level: None
resources:
- group: "coordination.k8s.io"
resources: ["leases"]
# Don't log health/readiness/liveness probes or OpenAPI discovery.
# These are polled every few seconds by kubelets and Flux controllers.
- level: None
nonResourceURLs:
- "/healthz*"
- "/readyz*"
- "/livez*"
- "/openapi*"
- "/version"
# Don't log node kubelet system account operations (node heartbeats,
# status updates). Still block-listed for auth so no security gap.
- level: None
userGroups: ["system:nodes"]
# Don't log get/list/watch on endpoints & endpointslices — these are
# polled constantly by kube-proxy replacement (Cilium) and coredns.
- level: None
verbs: ["get", "list", "watch"]
resources:
- group: ""
resources: ["endpoints"]
- group: "discovery.k8s.io"
resources: ["endpointslices"]
# Log everything else at Metadata level (headers only, no request body).
# This covers all auth, RBAC, resource mutations, etc.
- level: Metadata
controllerManager:
extraArgs:
bind-address: 0.0.0.0
coreDNS:
disabled: true
etcd:
extraArgs:
listen-metrics-urls: http://0.0.0.0:2381
advertisedSubnets:
- 10.0.0.0/24
proxy:
disabled: true
scheduler:
extraArgs:
bind-address: 0.0.0.0