7.9 KiB
FAQ: Internal Services Not Accessible via envoy-internal
Symptoms: A service exposed via
envoy-internalis unreachable from LAN clients even though the Flux HelmRelease, HTTPRoute, and pods all appear healthy.
How Internal DNS Works in This Cluster
This cluster uses two separate ingress paths and two separate DNS mechanisms:
graph TD
subgraph Internet
CF[Cloudflare DNS]
end
subgraph LAN
Win[Windows / LAN Client]
PH[Pi-hole\n10.0.0.156 → k8s-gateway]
end
subgraph Kubernetes - network namespace
EXT[envoy-external\nLB: 10.0.0.158]
INT[envoy-internal\nLB: 10.0.0.157]
EXTDNS[cloudflare-dns\nexternal-dns\n--gateway-name=envoy-external]
K8SGW[k8s-gateway\nCoreDNS plugin\n10.0.0.156:53\nwatches HTTPRoutes]
end
CF -->|CNAME: external.laurivan.com| EXT
EXTDNS -->|creates DNS records| CF
EXTDNS -.->|ignores envoy-internal routes| INT
PH -->|resolves *.laurivan.com| K8SGW
K8SGW -->|reads HTTPRoute hostnames| INT
Win -->|DNS via Pi-hole| PH
Win -->|TCP 443| INT
| Gateway | DNS mechanism | Accessible from |
|---|---|---|
envoy-external |
cloudflare-dns (external-dns) pushes records to Cloudflare |
Internet + LAN |
envoy-internal |
k8s-gateway (CoreDNS) serves DNS at 10.0.0.156 |
LAN only (via Pi-hole conditional forwarding) |
Key insight:
cloudflare-dnsruns with--gateway-name=envoy-external, so it only creates Cloudflare DNS records for routes attached toenvoy-external. Routes onenvoy-internalare handled exclusively byk8s-gateway— no Cloudflare involvement.
Issue 1: k8s-gateway Fails to Watch HTTPRoutes After Restart
Root Cause
At cluster startup, k8s-gateway can fail to initialize its HTTPRoute controller due to a transient API server connection reset:
[WARNING] plugin/k8s_gateway: error getting crd httproutes.gateway.networking.k8s.io,
error: unexpected error when reading response body. Please retry.
Original error: read tcp ...: connection reset by peer
When this happens, k8s-gateway falls back to only watching Service resources, returning NXDOMAIN for all internal hostnames — even ones that were previously working.
Diagnosis
# Check if k8s-gateway has HTTPRoute controller initialized
kubectl logs -n network -l app.kubernetes.io/name=k8s-gateway | grep -E "HTTPRoute|error"
# Test DNS resolution directly against k8s-gateway
dig +short myservice.laurivan.com @10.0.0.156
If dig returns empty and logs show the HTTPRoute controller error, proceed to the fix.
Fix
Restart the k8s-gateway deployment to force a clean re-initialization:
kubectl rollout restart deployment/k8s-gateway -n network
kubectl rollout status deployment/k8s-gateway -n network
After rollout, confirm all controllers initialized successfully:
kubectl logs -n network -l app.kubernetes.io/name=k8s-gateway | grep "controller initialized"
# Expected output:
# [INFO] plugin/k8s_gateway: GatewayAPI controller initialized
# [INFO] plugin/k8s_gateway: HTTPRoute controller initialized
# [INFO] plugin/k8s_gateway: Service controller initialized
Verify DNS resolves:
dig +short myservice.laurivan.com @10.0.0.156
# Should return 10.0.0.157 (envoy-internal LB IP)
Issue 2: New Internal Service Has No DNS Record
Root Cause
A new HTTPRoute attached to envoy-internal is not picked up by k8s-gateway after the restart (or k8s-gateway is healthy but the route was added after a long-running session). This can also happen if k8s-gateway was never restarted after a CRD re-sync.
In contrast to envoy-external routes (handled automatically by cloudflare-dns), envoy-internal routes require k8s-gateway to be running and watching HTTPRoutes correctly.
Diagnosis
# Does the HTTPRoute exist and is it accepted?
kubectl get httproute -A
kubectl describe httproute <name> -n <namespace>
# Does DNS resolve?
dig +short <hostname>.laurivan.com @10.0.0.156
# Are the pods healthy?
kubectl get pods -n <namespace>
Fix
If the HTTPRoute exists and is Accepted but DNS doesn't resolve, restart k8s-gateway (see Issue 1 above).
Issue 3: Pi-hole Conditional Forwarding Not Configured
For LAN clients to resolve internal hostnames, Pi-hole must forward laurivan.com queries to k8s-gateway at 10.0.0.156.
Setup
In Pi-hole → Settings → DNS → Conditional Forwarding:
| Field | Value |
|---|---|
| Local network CIDR | 10.0.0.0/24 |
| Router / DNS IP | 10.0.0.156 |
| Local domain name | laurivan.com |
Without this, LAN clients will resolve
*.laurivan.comvia public Cloudflare DNS, which has no records forenvoy-internalservices.
Diagnosing "Destination Host Unreachable" from a LAN Client
When you ping an internal service VIP (e.g. 10.0.0.157), you may see:
Reply from 10.0.0.147: Destination host unreachable.
This is normal and not an error. Here's why:
sequenceDiagram
participant Win as Windows Client
participant Node as esxi-2cu-8g-03 (10.0.0.147)<br/>holds L2 ARP lease for 10.0.0.157
participant Envoy as Envoy Proxy (VIP: 10.0.0.157)
Win->>Node: ICMP Echo Request → 10.0.0.157
Note over Node: ARP resolves 10.0.0.157 to Node's MAC<br/>(Cilium L2 announcement)
Node->>Envoy: forwards packet
Note over Envoy: Envoy only handles TCP 80/443<br/>ICMP is not forwarded
Node-->>Win: ICMP "Destination Host Unreachable"
Win->>Node: TCP SYN → 10.0.0.157:443
Node->>Envoy: proxies connection
Envoy-->>Win: TLS handshake + HTTP response ✅
10.0.0.147is the cluster node (esxi-2cu-8g-03) holding the Cilium L2 ARP announcement lease for10.0.0.157— it is not a router.- Envoy Gateway only listens on TCP 80/443. ICMP ping packets are not handled, so the node's kernel returns "host unreachable".
- Use
curlinstead ofpingto verify connectivity:
# Should return HTTP 200
curl -sk -o /dev/null -w "%{http_code}" https://myservice.laurivan.com
# Or directly by IP with a Host header
curl -sk -o /dev/null -w "%{http_code}" https://10.0.0.157 -H "Host: myservice.laurivan.com"
Adding a New Service on envoy-internal
How to expose a service internally (no public DNS)
- In your HelmRelease (or a standalone
HTTPRoute), setparentRefstoenvoy-internalin thenetworknamespace:
route:
app:
hostnames: ["myservice.${SECRET_DOMAIN}"]
parentRefs:
- name: envoy-internal
namespace: network
sectionName: https
rules:
- backendRefs:
- identifier: app
port: 80
- Ensure your Flux
KustomizationhaspostBuild.substituteFromset so${SECRET_DOMAIN}is substituted:
postBuild:
substituteFrom:
- name: cluster-secrets
kind: Secret
-
After Flux reconciles,
k8s-gatewaywill automatically pick up the new HTTPRoute and start serving DNS formyservice.laurivan.com→10.0.0.157. -
No
DNSEndpointCRD or Cloudflare changes are needed —k8s-gatewayhandles it.
Do NOT add a
DNSEndpointpointingmyservice.laurivan.com→internal.laurivan.com. That would push the record to public Cloudflare DNS, exposing an internal service to the internet.
Quick Diagnostic Checklist
# 1. Is the pod running?
kubectl get pods -n <namespace>
# 2. Is the HTTPRoute accepted?
kubectl get httproute -n <namespace> -o yaml | grep -A5 "status:"
# 3. Does k8s-gateway resolve the hostname?
dig +short <hostname>.laurivan.com @10.0.0.156
# 4. Is the k8s-gateway HTTPRoute controller active?
kubectl logs -n network -l app.kubernetes.io/name=k8s-gateway | grep -E "HTTPRoute|error|NXDOMAIN"
# 5. Can you reach the service on TCP?
curl -sk -o /dev/null -w "%{http_code}" https://<hostname>.laurivan.com
# 6. Is the Cilium L2 lease active?
kubectl -n kube-system get lease | grep l2announce