Update the 0.29.0 changelog entry to document the minimum
supported Tailscale client version (v1.76.0), which corresponds
to capability version 106 based on the 10-version support window.
Errors should not start capitalised and they should not contain the word error
or state that they "failed" as we already know it is an error
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
Go style recommends that log messages and error strings should not be
capitalized (unless beginning with proper nouns or acronyms) and should
not end with punctuation.
This change normalizes all zerolog .Msg() and .Msgf() calls to start
with lowercase letters, following Go conventions and making logs more
consistent across the codebase.
Replace raw string field names with zf constants in state.go and
db/node.go for consistent, type-safe logging.
state.go changes:
- User creation, hostinfo validation, node registration
- Tag processing during reauth (processReauthTags)
- Auth path and PreAuthKey handling
- Route auto-approval and MapRequest processing
db/node.go changes:
- RegisterNodeForTest logging
- Invalid hostname replacement logging
Add sub-logger patterns to worker(), AddNode(), RemoveNode() and
multiChannelNodeConn to eliminate repeated field calls. Use zf.*
constants for consistent field naming.
Changes in batcher_lockfree.go:
- Add wlog sub-logger in worker() with worker.id context
- Add log field to multiChannelNodeConn struct
- Initialize mc.log with node.id in newMultiChannelNodeConn()
- Add nlog sub-loggers in AddNode() and RemoveNode()
- Update all connection methods to use mc.log
Changes in batcher.go:
- Use zf.NodeID and zf.Reason in handleNodeChange()
Replace manual field extraction with EmbedObject for node logging
in gRPC handlers. Use zf.* constants for consistent field naming.
Changes:
- RegisterNode: use EmbedObject(node), zf.RegistrationKey, etc.
- SetTags: use EmbedObject(node)
- ExpireNode: use EmbedObject(node), zf.ExpiresAt
- RenameNode: use EmbedObject(node), zf.NewName
- SetApprovedRoutes: use zf.NodeID
Add sub-logger pattern to SetRoutes() to eliminate repeated node.id
field calls. Replace raw strings with zf.* constants throughout
the primary routes code for consistent field naming.
Changes:
- Add nlog sub-logger in SetRoutes() with node.id context
- Replace "prefix" with zf.Prefix
- Replace "changed" with zf.Changes
- Replace "newState" with zf.NewState
- Replace "finalState" with zf.FinalState
Replace the helper functions (logf, infof, tracef, errf) with a
zerolog sub-logger initialized in newMapSession(). The sub-logger
is pre-populated with session context (component, node, omitPeers,
stream) eliminating repeated field calls throughout the code.
Changes:
- Add log field to mapSession struct
- Initialize sub-logger with EmbedObject(node) and request context
- Remove logf/infof/tracef/errf helper functions
- Update all callers to use m.log.Level().Caller()... pattern
- Update noise.go to use sess.log instead of sess.tracef
This reduces code by ~20 lines and eliminates ~15 repeated field
calls per log statement.
Add a lint rule to enforce use of zf.* constants for zerolog field
names instead of inline string literals. This catches at lint time
any new code that doesn't follow the convention.
The rule matches common zerolog field methods (Str, Int, Bool, etc.)
and flags any usage with a string literal first argument.
Add hscontrol/util/zlog package with:
- zf subpackage: field name constants for compile-time safety
- SafeHostinfo: wrapper that redacts device fingerprinting data
- SafeMapRequest: wrapper that redacts client endpoints
The zf (zerolog fields) subpackage provides short constant names
(e.g., zf.NodeID instead of inline "node.id" strings) ensuring
consistent field naming across all log statements.
Security considerations:
- SafeHostinfo never logs: OSVersion, DeviceModel, DistroName
- SafeMapRequest only logs endpoint counts, not actual IPs
Update integration test expectations to match current policy behavior:
1. IPProto defaults include all four protocols (TCP, UDP, ICMPv4,
ICMPv6) for port-range ACL rules, not just TCP and UDP.
2. Filter rules with identical SrcIPs and IPProto are now merged
into a single rule with combined DstPorts, so the subnet router
receives one filter rule instead of two.
Updates #3036
According to Tailscale SaaS behavior, autogroup:internet is handled
by exit node routing via AllowedIPs, not by packet filtering. ACL
rules with autogroup:internet as destination should produce no
filter rules for any node.
Previously, Headscale expanded autogroup:internet to public CIDR
ranges and distributed filters to exit nodes (because 0.0.0.0/0
"covers" internet destinations). This was incorrect.
Add detection for AutoGroupInternet in filter compilation to skip
filter generation for this autogroup. Update test expectations
accordingly.
Fix two compatibility issues discovered in Tailscale SaaS testing:
1. Wildcard DstPorts format: Headscale was expanding wildcard
destinations to CGNAT ranges (100.64.0.0/10, fd7a:115c:a1e0::/48)
while Tailscale uses {IP: "*"} directly. Add detection for
wildcard (Asterix) alias type in filter compilation to use the
correct format.
2. proto:icmp handling: The "icmp" protocol name was returning both
ICMPv4 (1) and ICMPv6 (58), but Tailscale only returns ICMPv4.
Users should use "ipv6-icmp" or protocol number 58 explicitly
for IPv6 ICMP.
Update all test expectations accordingly. This significantly reduces
test file line count by replacing duplicated CGNAT range patterns
with single wildcard entries.
Update test expectations across policy tests to expect merged
FilterRule entries instead of separate ones. Tests now expect:
- Single FilterRule with combined DstPorts for same source
- Reduced matcher counts for exit node tests
Updates #3036
Tailscale merges multiple ACL rules into fewer FilterRule entries
when they have identical SrcIPs and IPProto, combining their DstPorts
arrays. This change implements the same behavior in Headscale.
Add mergeFilterRules() which uses O(n) hash map lookup to merge rules
with identical keys. DstPorts are NOT deduplicated to match Tailscale
behavior.
Also fix DestsIsTheInternet() to handle merged filter rules where
TheInternet is combined with other destinations - now uses superset
check instead of equality check.
Updates #3036
Change Asterix.Resolve() to use Tailscale's CGNAT range (100.64.0.0/10)
and ULA range (fd7a:115c:a1e0::/48) instead of all IPs (0.0.0.0/0 and
::/0).
This better matches Tailscale's security model where wildcard (*) means
"any node in the tailnet" rather than literally "any IP address on the
internet".
Updates #3036
Updates #3036
Tailscale validates that autogroup:self destinations in ACL rules can
only be used when ALL sources are users, groups, autogroup:member, or
wildcard (*). Previously, Headscale only performed this validation for
SSH rules.
Add validateACLSrcDstCombination() to enforce that tags, autogroup:tagged,
hosts, and raw IPs cannot be used as sources with autogroup:self
destinations. Invalid policies like `tag:client → autogroup:self:*` are
now rejected at validation time, matching Tailscale behavior.
Wildcard (*) is allowed because autogroup:self evaluation narrows it
per-node to only the node's own IPs.
Updates #3036
When ACL rules don't specify a protocol, Headscale now defaults to
[TCP, UDP, ICMP, ICMPv6] instead of just [TCP, UDP], matching
Tailscale's behavior.
Also export protocol number constants (ProtocolTCP, ProtocolUDP, etc.)
for use in external test packages, renaming the string protocol
constants to ProtoNameTCP, ProtoNameUDP, etc. to avoid conflicts.
This resolves 78 ICMP-related TODOs in the Tailscale compatibility
tests, reducing the total from 165 to 87.
Updates #3036
Add extensive test coverage verifying Headscale's ACL policy behavior
matches Tailscale's coordination server. Tests cover:
- Source/destination resolution for users, groups, tags, hosts, IPs
- autogroup:member, autogroup:tagged, autogroup:self behavior
- Filter rule deduplication and merging semantics
- Multi-rule interaction patterns
- Error case validation
Key behavioral differences documented:
- Headscale creates separate filter entries per ACL rule; Tailscale
merges rules with identical sources
- Headscale deduplicates Dsts within a rule; Tailscale does not
- Headscale does not validate autogroup:self source restrictions for
ACL rules (only SSH rules); Tailscale rejects invalid sources
Tests are based on real Tailscale coordination server responses
captured from a test environment with 5 nodes (1 user-owned, 4 tagged).
Updates #3036
Skip autogroup:self destination processing for tagged nodes since they
can never match autogroup:self (which only applies to user-owned nodes).
Also reorder the IsTagged() check to short-circuit before accessing
User() to avoid potential nil pointer access on tagged nodes.
Updates #3036
Add TestTagsAuthKeyConvertToUserViaCLIRegister that reproduces the
exact panic from #3038: register a node with a tags-only PreAuthKey
(no user), force reauth with empty tags, then register via CLI with
a user. The mapper panics on node.Owner().Model().ID when User is nil.
The critical detail is using a tags-only PreAuthKey (User: nil). When
the key is created under a user, the node inherits the User pointer
from createAndSaveNewNode and the bug is masked.
Also add Owner() validity assertions to the existing unit test
TestTaggedNodeWithoutUserToDifferentUser to catch the nil pointer
at the unit test level.
Updates #3038
processReauthTags sets UserID when converting a tagged node to
user-owned, but does not set the User pointer. When the node was
registered with a tags-only PreAuthKey (User: nil), the in-memory
NodeStore cache holds a node with User=nil. The mapper's
generateUserProfiles then calls node.Owner().Model().ID, which
dereferences the nil pointer and panics.
Set node.User alongside node.UserID in processReauthTags. Also add
defensive nil checks in generateUserProfiles to gracefully handle
nodes with invalid owners rather than panicking.
Fixes#3038
These were thin wrappers around applyAuthNodeUpdate that only added
logging. Move the logging into applyAuthNodeUpdate and call it directly
from HandleNodeFromAuthPath.
This simplifies the code structure without changing behavior.
Updates #3038
Move tag validation before the UpdateNode callback in applyAuthNodeUpdate.
Previously, tag validation happened inside the callback, and the error
check occurred after UpdateNode had already committed changes to the
NodeStore. This left the NodeStore in an inconsistent state when tags
were rejected.
Now validation happens first, and UpdateNode is only called when we know
the operation will succeed. This follows the principle that UpdateNode
should only be called when we have all information and are ready to commit.
Also extract validateRequestTags as a reusable function and use it in
createAndSaveNewNode to deduplicate the tag validation logic.
Updates #3038
Updates #3048
Previously, expiry handling ran BEFORE processReauthTags(), using the
old tagged status to determine whether to set/clear expiry. This caused:
- Personal → Tagged: Expiry remained set (should be cleared to nil)
- Tagged → Personal: Expiry remained nil (should be set from client)
Move expiry handling after tag processing and handle all four transition
cases based on the new tagged status:
- Tagged → Personal: Set expiry from client request
- Personal → Tagged: Clear expiry (tagged nodes don't expire)
- Personal → Personal: Update expiry from client
- Tagged → Tagged: Keep existing nil expiry
Fixes#3048
Reorganize HandleNodeFromAuthPath (~300 lines) into a cleaner structure
with named conditions and extracted helper functions.
Changes:
- Add authNodeUpdateParams struct for shared update logic
- Extract applyAuthNodeUpdate for common reauth/convert operations
- Extract reauthExistingNode and convertTaggedNodeToUser handlers
- Extract createNewNodeFromAuth for new node creation
- Use named boolean conditions (nodeExistsForSameUser, existingNodeIsTagged,
existingNodeOwnedByOtherUser) instead of compound if conditions
- Create logger with common fields (registration_id, user.name, machine.key,
method) to reduce log statement verbosity
Updates #3038
When a node was registered with a tags-only PreAuthKey (no user
associated), the node had User=nil and UserID=nil. When attempting to
re-register this node to a different user via HandleNodeFromAuthPath,
two issues occurred:
1. The code called oldUser.Name() without checking if oldUser was valid,
causing a nil pointer dereference panic.
2. The existing node lookup logic didn't find the tagged node because it
searched by (machineKey, userID), but tagged nodes have no userID.
This caused a new node to be created instead of updating the existing
tagged node.
Fix this by restructuring HandleNodeFromAuthPath to:
1. First check if a node exists for the same user (existing behavior)
2. If not found, check if an existing TAGGED node exists with the same
machine key (regardless of userID)
3. If a tagged node exists, UPDATE it to convert from tagged to
user-owned (preserving the node ID)
4. Only create a new node if the existing node is user-owned by a
different user
This ensures consistent behavior between:
- personal → tagged → personal (same node, same owner)
- tagged (no user) → personal (same node, new owner)
Add a test that reproduces the panic and conversion scenario by:
1. Creating a tags-only PreAuthKey (no user)
2. Registering a node with that key
3. Re-registering the same machine to a different user
4. Verifying the node ID stays the same (conversion, not creation)
Fixes#3038
Both compileFilterRules and compileSSHPolicy include .Caller() on
their resolution error log statements, but compileACLWithAutogroupSelf
does not. Add .Caller() to the three log sites (source resolution
error, destination resolution error, nil destination) for consistent
debuggability across all compilation paths.
Updates #2990
In compileSSHPolicy, when resolving other (non-autogroup:self)
destinations, the code discards the entire result on error via
`continue`. If a destination alias (e.g., a tag owned by a group
with a non-existent user) returns a partial IPSet alongside an
error, valid IPs are lost.
Both ACL compilation paths (compileFilterRules and
compileACLWithAutogroupSelf) already handle this correctly by
logging the error and using the IPSet if non-nil.
Remove the `continue` so the SSH path is consistent with the
ACL paths.
Fixes#2990
Three related issues where User().ID() is called on potentially tagged
nodes without first checking IsTagged():
1. compileACLWithAutogroupSelf: the autogroup:self block at line 166
lacks the !node.IsTagged() guard that compileSSHPolicy already has.
If a tagged node is the compilation target, node.User().ID() may
panic. Tagged nodes should never participate in autogroup:self.
2. compileSSHPolicy: the IsTagged() check is on the right side of &&,
so n.User().ID() evaluates first and may panic before short-circuit
can prevent it. Swap to !n.IsTagged() && n.User().ID() == ... to
match the already-correct order in compileACLWithAutogroupSelf.
3. invalidateAutogroupSelfCache: calls User().ID() at ~10 sites
without IsTagged() guards. Tagged nodes don't participate in
autogroup:self, so they should be skipped when collecting affected
users and during cache lookup. Tag status transitions are handled
by using the non-tagged version's user ID.
Fixes#2990
In compileACLWithAutogroupSelf, when a group contains a non-existent
user, Group.Resolve() returns a partial IPSet (with IPs from valid
users) alongside an error. The code was discarding the entire result
via `continue`, losing valid IPs. The non-autogroup-self path
(compileFilterRules) already handles this correctly by logging the
error and using the IPSet if non-empty.
Remove the `continue` on error for both source and destination
resolution, matching the existing behavior in compileFilterRules.
Also reorder the IsTagged check before User().ID() comparison
in the same-user node filter to prevent nil dereference on tagged
nodes that have no User set.
Fixes#2990