fix: fix tuppr
This commit is contained in:
parent
0d386b9de1
commit
74c7f551a9
@ -1,17 +1,19 @@
|
||||
{"id":"homelab-3dv","title":"Fix cluster-apps kustomization - duplicate system-upgrade namespace","description":"cluster-apps failing: duplicate namespace system-upgrade. Fix: Remove duplicate namespace.yaml references in kubernetes/apps/system-upgrade/kustomization.yaml","status":"open","priority":1,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T20:43:58.062314397+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T20:43:58.062314397+01:00","labels":["flux","kustomize","system-upgrade"]}
|
||||
{"id":"homelab-3p8","title":"Watch cluster rollout","description":"Watch the rollout of the cluster to ensure all pods are starting correctly","acceptance_criteria":"- Command `kubectl get pods --all-namespaces --watch` is running\n- All pods are observed rolling out\n- Pods reach Running/Ready state","status":"open","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:25.122454196+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:32:25.122454196+01:00","labels":["bootstrap","verification"]}
|
||||
{"id":"homelab-4cn","title":"Configure GitHub webhook for Flux","description":"Configure GitHub webhook to send push events to Flux for automatic reconciliation on git push","acceptance_criteria":"- Command `kubectl -n flux-system get receiver github-webhook --output=jsonpath='{.status.webhookPath}'` returns webhook path\n- Full webhook URL is constructed with format: https://flux-webhook.${cloudflare_domain}/hook/{path}\n- Webhook is added to GitHub repository settings\n- Webhook payload URL is set correctly\n- Content type is set to application/json\n- Secret token from github-push-token.txt is configured\n- Events are set to \"Just the push event\"\n- Webhook is saved and active","status":"open","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:33:23.881275565+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:33:23.881275565+01:00","labels":["configuration","flux","github"]}
|
||||
{"id":"homelab-5wg","title":"Fix network configuration conflicts (etcd + routes)","description":"Multiple network configuration issues on the cluster nodes:\n\n**Issue 1: etcd Peer URL Conflict**\nNode esxi-2cu-8g-01 (10.0.0.146) has duplicate peer URLs in etcd (10.0.0.128 and 10.0.0.146), causing \"Peer URLs already exists\" error. Node is currently unreachable.\n\n**Issue 2: Network Route Conflict**\nNodes are showing route conflict errors:\n```\nerror adding route: netlink receive: file exists\ngateway: 10.0.0.129\n```\n\nThis is because nodes were previously configured with `/24` subnet and gateway `10.0.0.1`, but now configured with `/27` subnet and gateway `10.0.0.129`. Old routes persist.\n\n**Root Cause:**\nConfiguration changed from:\n- Old: 10.0.0.0/24, gateway 10.0.0.1\n- New: 10.0.0.128/27, gateway 10.0.0.129\n\n**Solution:**\n1. Reset ALL nodes to clear old network config\n2. Re-apply Talos configuration\n3. Bootstrap cluster fresh\n\nCommands:\n```bash\n# Reset each node\ntalosctl -n 10.0.0.145 reset --graceful=false --reboot\ntalosctl -n 10.0.0.146 reset --graceful=false --reboot \ntalosctl -n 10.0.0.147 reset --graceful=false --reboot\n\n# Wait for nodes to boot into maintenance mode, then:\ntask bootstrap:talos\n```","acceptance_criteria":"- Member ceeb52e03fde8032 is removed from etcd cluster\n- Node 10.0.0.146 is reset and reconfigured\n- Node rejoins etcd cluster with correct peer URL\n- `talosctl etcd members` shows only one peer URL per member\n- All three nodes are healthy in etcd cluster","notes":"**Recommended Fix: Full Cluster Reset (Option 1)**\n\nAll nodes are currently offline. Once nodes are back online, execute:\n\n```bash\n# Reset all nodes to maintenance mode\ntalosctl -n 10.0.0.145 reset --graceful=false --reboot --insecure\ntalosctl -n 10.0.0.146 reset --graceful=false --reboot --insecure\ntalosctl -n 10.0.0.147 reset --graceful=false --reboot --insecure\n\n# Wait for nodes to boot into maintenance mode (~2-3 min)\n# Verify with: nmap -Pn -n -p 50000 10.0.0.145-147 -vv\n\n# Re-bootstrap\ntask bootstrap:talos\ntask bootstrap:apps\n```\n\nThis is the cleanest approach to clear all lingering network config and etcd state issues. Estimated time: ~15 minutes total.","status":"closed","priority":1,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T01:10:22.498887798+01:00","created_by":"Laur IVAN","updated_at":"2026-02-10T22:59:48.077254996+01:00","closed_at":"2026-02-10T22:59:48.077254996+01:00","close_reason":"Fixed - etcd cluster healthy with 3 members, each with single peer URL. No route conflicts. All cluster health checks passed.","labels":["etcd","talos","urgent"]}
|
||||
{"id":"homelab-7k4","title":"Push talhelper encrypted secret to git","description":"After installing Talos, commit and push the talhelper encrypted secret to the repository","acceptance_criteria":"- Changes are staged with `git add -A`\n- Commit is created with message \"chore: add talhelper encrypted secret :lock:\"\n- Changes are pushed to remote repository","status":"closed","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:05.950780413+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:44:58.80046492+01:00","closed_at":"2026-02-07T00:44:58.80046492+01:00","close_reason":"Successfully staged, committed, and pushed talhelper encrypted secret to git repository","labels":["bootstrap","git"]}
|
||||
{"id":"homelab-82o","title":"Verify Flux status and resources","description":"Check the status of Flux and verify all Flux resources are up-to-date and in a ready state","acceptance_criteria":"- Command `flux check` passes all checks\n- Command `flux get sources git flux-system` shows ready state\n- Command `flux get ks -A` shows all kustomizations ready\n- Command `flux get hr -A` shows all helm releases ready","status":"closed","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:43.666513198+01:00","created_by":"Laur IVAN","updated_at":"2026-02-10T23:03:07.067406014+01:00","closed_at":"2026-02-10T23:03:07.067406014+01:00","close_reason":"Verified - Flux check passed. All controllers ready (helm, kustomize, notification, source). GitRepository synced. All Kustomizations applied successfully.","labels":["flux","verification"]}
|
||||
{"id":"homelab-c68","title":"Fix volsync MutatingAdmissionPolicy API version","description":"Kustomization storage-system/volsync is failing with error:\n\n```\nMutatingAdmissionPolicy/storage-system/volsync-mover-jitter dry-run failed: no matches for kind \"MutatingAdmissionPolicy\" in version \"admissionregistration.k8s.io/v1beta1\"\n```\n\n**Root Cause:**\nMutatingAdmissionPolicy API does not exist in Kubernetes 1.34. Only `admissionregistration.k8s.io/v1` is available, which only includes MutatingWebhookConfiguration.\n\nThe MutatingAdmissionPolicy feature was experimental and appears to have been removed or never graduated to stable.\n\n**What the policy does:**\nAdds a jitter initContainer to volsync jobs to randomize start times (sleep 0-30 seconds). This is optional functionality.\n\n**Fix:**\nRemove or comment out the mutating-admission-policy.yaml file from kubernetes/apps/storage-system/volsync/app/kustomization.yaml since this feature is not available in K8s 1.34 and is non-critical.","notes":"**Verified in K8s 1.35.0:**\nIssue still exists. MutatingAdmissionPolicy API is not available in Kubernetes 1.35.0.\n\nOnly `admissionregistration.k8s.io/v1` exists, which includes:\n- MutatingWebhookConfiguration\n- ValidatingWebhookConfiguration\n\nMutatingAdmissionPolicy/MutatingAdmissionPolicyBinding are not available.\n\nThe fix remains the same: remove or comment out the mutating-admission-policy.yaml file.","status":"open","priority":2,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T00:51:29.41277186+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T10:24:51.047684333+01:00","labels":["api-version","storage","volsync"]}
|
||||
{"id":"homelab-c68","title":"Fix volsync MutatingAdmissionPolicy API version","description":"Kustomization storage-system/volsync is failing with error:\n\n```\nMutatingAdmissionPolicy/storage-system/volsync-mover-jitter dry-run failed: no matches for kind \"MutatingAdmissionPolicy\" in version \"admissionregistration.k8s.io/v1beta1\"\n```\n\n**Root Cause:**\nMutatingAdmissionPolicy API does not exist in Kubernetes 1.34. Only `admissionregistration.k8s.io/v1` is available, which only includes MutatingWebhookConfiguration.\n\nThe MutatingAdmissionPolicy feature was experimental and appears to have been removed or never graduated to stable.\n\n**What the policy does:**\nAdds a jitter initContainer to volsync jobs to randomize start times (sleep 0-30 seconds). This is optional functionality.\n\n**Fix:**\nRemove or comment out the mutating-admission-policy.yaml file from kubernetes/apps/storage-system/volsync/app/kustomization.yaml since this feature is not available in K8s 1.34 and is non-critical.","notes":"**Verified in K8s 1.35.0:**\nIssue still exists. MutatingAdmissionPolicy API is not available in Kubernetes 1.35.0.\n\nOnly `admissionregistration.k8s.io/v1` exists, which includes:\n- MutatingWebhookConfiguration\n- ValidatingWebhookConfiguration\n\nMutatingAdmissionPolicy/MutatingAdmissionPolicyBinding are not available.\n\nThe fix remains the same: remove or comment out the mutating-admission-policy.yaml file.","status":"closed","priority":2,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T00:51:29.41277186+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T11:54:34.256741899+01:00","closed_at":"2026-02-11T11:54:34.256741899+01:00","close_reason":"Fixed - MutatingAdmissionPolicy feature gate enabled in K8s 1.35. API now available (admissionregistration.k8s.io/v1beta1). Volsync kustomization reconciled successfully.","labels":["api-version","storage","volsync"]}
|
||||
{"id":"homelab-f7u","title":"Tidy up repository (remove templates)","description":"Clean up the repository by removing the templates directory and templating-related files to eliminate clutter and resolve Renovate warnings","acceptance_criteria":"- Command `task template:tidy` completes successfully\n- Templates directory is removed\n- Templating-related files are cleaned up\n- Changes are committed with message \"chore: tidy up :broom:\"\n- Changes are pushed to git","status":"open","priority":3,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:33:32.475687645+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:33:32.475687645+01:00","labels":["cleanup","git"]}
|
||||
{"id":"homelab-gqj","title":"Bootstrap cluster applications (cilium, coredns, spegel, flux)","description":"Install cilium, coredns, spegel, flux and sync the cluster to the repository state","acceptance_criteria":"- Command `task bootstrap:apps` completes successfully\n- Cilium is installed\n- CoreDNS is installed\n- Spegel is installed\n- Flux is installed\n- Cluster is synced to repository state","status":"closed","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:15.371162045+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T15:50:03.091375279+01:00","closed_at":"2026-02-07T15:50:03.091375279+01:00","close_reason":"Successfully installed cilium, coredns, spegel, cert-manager, flux-operator. Flux-instance is reconciling (timeout is normal). All nodes are Ready.","labels":["apps","bootstrap"]}
|
||||
{"id":"homelab-hmc","title":"Finish monitoring system setup","description":"Uncomment the grafana and kube-prometheus-stack resources in kubernetes/apps/monitoring-system/kustomization.yaml to enable the full monitoring stack with Grafana dashboards and Prometheus metrics collection","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-09T22:53:49.071709362+01:00","updated_at":"2026-02-09T22:53:49.071709362+01:00","labels":["grafana","monitoring","prometheus"]}
|
||||
{"id":"homelab-icy","title":"Publish Kubernetes schemas locally","description":"Set up CronJob to publish K8s schemas locally. Reference: https://github.com/bjw-s-labs/home-ops/tree/main/kubernetes/apps/jobs/publish-k8s-schemas","status":"open","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-10T22:57:34.155916454+01:00","created_by":"Laur IVAN","updated_at":"2026-02-10T22:57:34.155916454+01:00","labels":["cronjob","schemas","validation"]}
|
||||
{"id":"homelab-k3j","title":"Verify DNS resolution for echo subdomain","description":"Check that DNS resolution works for the echo subdomain and resolves to the Cloudflare gateway address","acceptance_criteria":"- Command `dig @${cluster_dns_gateway_addr} echo.${cloudflare_domain}` resolves successfully\n- DNS resolves to ${cloudflare_gateway_addr}\n- DNS resolution is working correctly","status":"closed","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:33:02.539037288+01:00","created_by":"Laur IVAN","updated_at":"2026-02-10T23:03:01.06585734+01:00","closed_at":"2026-02-10T23:03:01.06585734+01:00","close_reason":"Verified - DNS resolution working. echo.laurivan.com resolves to 10.0.0.158 (envoy-external gateway) via k8s-gateway","labels":["dns","verification"]}
|
||||
{"id":"homelab-mbk","title":"Verify TCP connectivity to gateways","description":"Check TCP connectivity to both the internal and external gateways on port 443","acceptance_criteria":"- Command `nmap -Pn -n -p 443 ${cluster_gateway_addr} ${cloudflare_gateway_addr} -vv` succeeds\n- Port 443 is open on both internal and external gateways\n- TCP connectivity is confirmed","status":"open","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:54.223562688+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:32:54.223562688+01:00","labels":["network","verification"]}
|
||||
{"id":"homelab-mi7","title":"Fix tuppr HelmRelease timeout","description":"tuppr HelmRelease stuck in 'InProgress' status, causing health check timeout after 5m. Related to homelab-oqx (ServiceAccount API version bug in chart v0.0.52)","status":"open","priority":2,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T20:44:11.669905841+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T20:44:11.669905841+01:00","labels":["helm","system-upgrade","tuppr"]}
|
||||
{"id":"homelab-n0h","title":"Verify Cilium status","description":"Verify that Cilium is installed and running correctly","acceptance_criteria":"- Command `cilium status` runs successfully\n- Cilium reports healthy status\n- All Cilium components are operational","status":"closed","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:32:34.123646456+01:00","created_by":"Laur IVAN","updated_at":"2026-02-10T23:01:46.996445944+01:00","closed_at":"2026-02-10T23:01:46.996445944+01:00","close_reason":"Verified - Cilium OK, Operator OK, 3/3 DaemonSet ready, 1/1 Operator ready, 29/29 cluster pods managed","labels":["cilium","verification"]}
|
||||
{"id":"homelab-oqx","title":"Fix tuppr HelmRelease - invalid ServiceAccount API version","description":"Tuppr HelmRelease is failing with error:\n\n```\nHelm install failed: resource mapping not found for name: \"tuppr-talosconfig\" namespace: \"system-upgrade\" from \"\": no matches for kind \"ServiceAccount\" in version \"talos.dev/v1alpha1\"\nensure CRDs are installed first\n```\n\nThe tuppr chart is trying to create a ServiceAccount with apiVersion `talos.dev/v1alpha1`, which is invalid. ServiceAccount should use `v1` API version.\n\nThis appears to be a bug in the tuppr chart itself (version 0.0.52). The chart is incorrectly using a Talos-specific API version for a standard Kubernetes ServiceAccount resource.\n\nPossible fixes:\n1. Wait for upstream chart fix\n2. Use a different version of tuppr\n3. Apply a patch to fix the ServiceAccount apiVersion\n4. Disable tuppr if not critical","status":"open","priority":2,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T00:51:35.813199154+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T01:01:47.963406638+01:00","labels":["oci-repository","system-upgrade","tuppr"]}
|
||||
{"id":"homelab-oqx","title":"Fix tuppr HelmRelease - invalid ServiceAccount API version","description":"Tuppr HelmRelease is failing with error:\n\n```\nHelm install failed: resource mapping not found for name: \"tuppr-talosconfig\" namespace: \"system-upgrade\" from \"\": no matches for kind \"ServiceAccount\" in version \"talos.dev/v1alpha1\"\nensure CRDs are installed first\n```\n\nThe tuppr chart is trying to create a ServiceAccount with apiVersion `talos.dev/v1alpha1`, which is invalid. ServiceAccount should use `v1` API version.\n\nThis appears to be a bug in the tuppr chart itself (version 0.0.52). The chart is incorrectly using a Talos-specific API version for a standard Kubernetes ServiceAccount resource.\n\nPossible fixes:\n1. Wait for upstream chart fix\n2. Use a different version of tuppr\n3. Apply a patch to fix the ServiceAccount apiVersion\n4. Disable tuppr if not critical","status":"closed","priority":2,"issue_type":"bug","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-11T00:51:35.813199154+01:00","created_by":"Laur IVAN","updated_at":"2026-02-11T23:25:32.684511823+01:00","closed_at":"2026-02-11T23:25:32.684511823+01:00","close_reason":"Closed via update","labels":["oci-repository","system-upgrade","tuppr"]}
|
||||
{"id":"homelab-rzs","title":"Verify wildcard Certificate status","description":"Check the status of the wildcard Certificate in the network namespace","acceptance_criteria":"- Command `kubectl -n network describe certificates` runs successfully\n- Certificate status shows Ready condition\n- Certificate is valid and not expired","status":"open","priority":2,"issue_type":"task","owner":"laur.ivan@ec.europa.eu","created_at":"2026-02-07T00:33:12.166198226+01:00","created_by":"Laur IVAN","updated_at":"2026-02-07T00:33:12.166198226+01:00","labels":["certificates","verification"]}
|
||||
{"id":"homelab-u3p","title":"Install homepage dashboard","description":"Create the homepage application manifests (helmrelease, ocirepository, kustomization) in kubernetes/apps/default/homepage/app/ directory and configure the ks.yaml to deploy it","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-09T22:53:44.511470131+01:00","updated_at":"2026-02-09T22:53:44.511470131+01:00","labels":["dashboard","deployment","homepage"]}
|
||||
{"id":"homelab-xpp","title":"Install home assistant for home automation","description":"Create home assistant application manifests (helmrelease, ocirepository, kustomization) in kubernetes/apps/default/home-assistant/app/ directory and configure deployment.\n\nNote: Ensure the application has network access to the IoT VLAN where most smart home devices are located. This may require configuring network policies or multus CNI for VLAN access.","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-09T22:57:31.4810088+01:00","updated_at":"2026-02-09T22:57:31.4810088+01:00","labels":["automation","home-assistant","iot","networking"]}
|
||||
|
||||
@ -22,25 +22,3 @@ spec:
|
||||
namespace: flux-system
|
||||
targetNamespace: system-upgrade
|
||||
timeout: 5m
|
||||
---
|
||||
# yaml-language-server: $schema=https://raw.githubusercontent.com/fluxcd-community/flux2-schemas/refs/heads/main/kustomization-kustomize-v1.json
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: tuppr-upgrades
|
||||
spec:
|
||||
commonMetadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: tuppr
|
||||
dependsOn:
|
||||
- name: tuppr
|
||||
interval: 1h
|
||||
path: "./kubernetes/apps/system-upgrade/tuppr/upgrades"
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
namespace: flux-system
|
||||
targetNamespace: system-upgrade
|
||||
timeout: 5m
|
||||
wait: false
|
||||
@ -8,5 +8,15 @@ spec:
|
||||
kind: OCIRepository
|
||||
name: tuppr
|
||||
interval: 30m
|
||||
postRenderers:
|
||||
- kustomize:
|
||||
patches:
|
||||
- target:
|
||||
kind: ServiceAccount
|
||||
name: tuppr-talosconfig
|
||||
patch: |
|
||||
- op: replace
|
||||
path: /apiVersion
|
||||
value: v1
|
||||
values:
|
||||
replicaCount: 2
|
||||
7
kubernetes/apps/system-upgrade/tuppr/kustomization.yaml
Normal file
7
kubernetes/apps/system-upgrade/tuppr/kustomization.yaml
Normal file
@ -0,0 +1,7 @@
|
||||
---
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
resources:
|
||||
- ./app.ks.yaml
|
||||
- ./upgrades.ks.yaml
|
||||
22
kubernetes/apps/system-upgrade/tuppr/upgrades.ks.yaml
Normal file
22
kubernetes/apps/system-upgrade/tuppr/upgrades.ks.yaml
Normal file
@ -0,0 +1,22 @@
|
||||
---
|
||||
# yaml-language-server: $schema=https://raw.githubusercontent.com/fluxcd-community/flux2-schemas/refs/heads/main/kustomization-kustomize-v1.json
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: tuppr-upgrades
|
||||
spec:
|
||||
commonMetadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: tuppr
|
||||
dependsOn:
|
||||
- name: tuppr
|
||||
interval: 1h
|
||||
path: "./kubernetes/apps/system-upgrade/tuppr/upgrades"
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
namespace: flux-system
|
||||
targetNamespace: system-upgrade
|
||||
timeout: 5m
|
||||
wait: false
|
||||
Loading…
Reference in New Issue
Block a user