Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.
Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.
This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.
In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:
Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.
Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.
When the poll recieved the map, it is only responsible for taking it and
sending it to the node.
This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This adds new flags for collecting memory and cpu stats from
containers during a run and the ability to fail runs if the
tailscale og headscale container uses too much memory.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There was a bug in HA subnet router handover where we used stale node data
from the longpoll session that we handed to Connect. This meant that we got
some odd behaviour where routes would not be deactivated correctly.
This commit changes to the nodeview is used through out, and we load the
current node to be updated in the write path and then handle it all there
to be consistent.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit changes most of our (*)types.Node to
types.NodeView, which is a readonly version of the
underlying node ensuring that there is no mutations
happening in the read path.
Based on the migration, there didnt seem to be any, but the
idea here is to prevent it in the future and simplify other
new implementations.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
We are already being punished by github actions, there seem to be
little value in running all the tests for both databases, so only
run a few key tests to check postgres isnt broken.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Restructure and rewrite the OpenID Connect documentation. Start from the
most minimal configuration and describe what needs to be done both in
Headscale and the identity provider. Describe additional features such
as PKCE and authorization filters in a generic manner with examples.
Document how Headscale populates its user profile and how it relates to
OIDC claims. This is a revised version from the table in the changelog.
Document the validation rules for fields and extend known limitations.
Sort the provider specific section alphabetically and add a section for
Authelia, Authentik, Kanidm and Keycloak. Also simplify and rename Azure
to Entra ID.
Update the description for the oidc section in the example
configuration. Give a short explanation of each configuration setting.
All documentend features were tested with Headscale 0.26 (using a fresh
database each time) using the following identity providers:
* Authelia
* Authentik
* Kanidm
* Keycloak
Fixes: #2295
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.
The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.
Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* cmd/hi: add integration test runner CLI tool
Add a new CLI tool 'hi' for running headscale integration tests
with Docker automation. The tool replaces manual Docker command
composition with an automated solution.
Features:
- Run integration tests in golang:1.24 containers
- Docker context detection (supports colima and other contexts)
- Test isolation with unique run IDs and isolated control_logs
- Automatic Docker image pulling and container management
- Comprehensive cleanup operations for containers, networks, images
- Docker volume caching for Go modules
- Verbose logging and detailed test artifact reporting
- Support for PostgreSQL/SQLite selection and various test flags
Usage: go run ./cmd/hi run TestPingAllByIP --verbose
The tool uses creachadair/command and flax for CLI parsing and
provides cleanup subcommands for Docker resource management.
Updates flake.nix vendorHash for new Go dependencies.
* ci: update integration tests to use hi CLI tool
Replace manual Docker command composition in GitHub Actions
workflow with the new hi CLI tool for running integration tests.
Changes:
- Replace complex docker run command with simple 'go run ./cmd/hi run'
- Remove manual environment variable setup (handled by hi tool)
- Update artifact paths for new timestamped log directory structure
- Simplify command from 15+ lines to 3 lines
- Maintain all existing functionality (postgres/sqlite, timeout, test patterns)
The hi tool automatically handles Docker context detection, container
management, volume mounting, and environment variable setup that was
previously done manually in the workflow.
* makefile: remove test integration
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* feat: add verify client config for embedded DERP
* refactor: embedded DERP no longer verify clients via HTTP
- register the `headscale://` protocol in `http.DefaultTransport` to intercept network requests
- update configuration to use a single boolean option `verify_clients`
* refactor: use `http.HandlerFunc` for type definition
* refactor: some renaming and restructuring
* chore: some renaming and fix lint
* test: fix TestDERPVerifyEndpoint
- `tailscale debug derp` use random node private key
* test: add verify clients integration test for embedded DERP server
* fix: apply code review suggestions
* chore: merge upstream changes
* fix: apply code review suggestions
---------
Co-authored-by: Kristoffer Dalby <kristoffer@dalby.cc>
* Improve map auth logic
* Bugfix
* Add comment, improve error message
* noise: make func, get by node
this commit splits the additional validation into a
separate function so it can be reused if we add more
endpoints in the future.
It swaps the check, so we still look up by NodeKey, but before
accepting the connection, we validate the known machinekey from
the db against the noise connection.
The reason for this is that when a node logs in or out, the node key
is replaced and it will no longer be possible to look it up, breaking
reauthentication.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* noise: add comment to remind future use of getAndVal
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* changelog: add entry
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
The systemd target "syslog.target" and not required because syslog is
socket activated.
The directory /var/run is usually a symlink to /run and its created by
systemd via the RuntimeDirectory=headscale option. System creates and
handles permissions, no need to manually mark it as a read-write path.