This PR addresses some consistency issues that was introduced or discovered with the nodestore.
nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.
auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.
integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)
This PR was assisted, particularly tests, by claude code.
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.
It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.
Writes will block until commited, while reads are never
blocked.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This change makes editing the generated command easier.
For example, after pasting into a terminal, the cursor position will be
near the username portion which requires editing.
* types/authkey: include user object, not string
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* make preauthkeys use id
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* changelog
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* integration: wire up user id for auth keys
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* utility iterator for ipset
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* split policy -> policy and v1
This commit split out the common policy logic and policy implementation
into separate packages.
policy contains functions that are independent of the policy implementation,
this typically means logic that works on tailcfg types and generic formats.
In addition, it defines the PolicyManager interface which the v1 implements.
v1 is a subpackage which implements the PolicyManager using the "original"
policy implementation.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use polivyv1 definitions in integration tests
These can be marshalled back into JSON, which the
new format might not be able to.
Also, just dont change it all to JSON strings for now.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* formatter: breaks lines
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove compareprefix, use tsaddr version
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove getacl test, add back autoapprover
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use policy manager tag handling
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* rename display helper for user
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* introduce policy v2 package
policy v2 is built from the ground up to be stricter
and follow the same pattern for all types of resolvers.
TODO introduce
aliass
resolver
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* wire up policyv2 in integration testing
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* split policy v2 tests into seperate workflow to work around github limit
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add policy manager output to /debug
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* update changelog
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* initial capver packet tracking version
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* Log the minimum version as client version, not only capver
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* remove old versions
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* use capver for integration tests
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* changelog
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* patch through m and n key
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* Add -race flag to Makefile and integration tests; fix data race in CreateTailscaleNodesInUser
* Fix data race in ExecuteCommand by using local buffers and mutex
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
* lint
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
---------
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Docker releases a patch release which changed the required permissions to be able to do tun devices in containers, this caused all containers to fail in tests causing us to fail all tests. This fixes it, and adds some tools for debugging in the future.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add shutdown that asserts if headscale had panics
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* add test case producing 2118 panic
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* make stream shutdown if self-node has been removed
Currently we will read the node from database, and since it is
deleted, the id might be set to nil. Keep the node around and
just shutdown, so it is cleanly removed from notifier.
Fixes#2118
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* Fix KeyExpiration when a zero time value has a timezone
When a zero time value is loaded from JSON or a DB in a way that
assigns it the local timezone, it does not roudtrip in JSON as a
value for which IsZero returns true. This causes KeyExpiry to be
treated as a far past value instead of a nilish value.
See https://github.com/golang/go/issues/57040
* Fix whitespace
* Ensure that postgresql is used for all tests when env var is set
* Pass through value of HEADSCALE_INTEGRATION_POSTGRES env var
* Add option to set timezone on headscale container
* Add test for registration with auth key in alternate timezone
this commit changes and streamlines the dns_config into a new
key, dns. It removes a combination of outdates and incompatible
configuration options that made it easy to confuse what headscale
could and could not do, or what to expect from ones configuration.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This PR removes the complicated session management introduced in https://github.com/juanfont/headscale/pull/1791 which kept track of the sessions in a map, in addition to the channel already kept track of in the notifier.
Instead of trying to close the mapsession, it will now be replaced by the new one and closed after so all new updates goes to the right place.
The map session serve function is also split into a streaming and a non-streaming version for better readability.
RemoveNode in the notifier will not remove a node if the channel is not matching the one that has been passed (e.g. it has been replaced with a new one).
A new tuning parameter has been added to added to set timeout before the notifier gives up to send an update to a node.
Add a keep alive resetter so we wait with sending keep alives if a node has just received an update.
In addition it adds a bunch of env debug flags that can be set:
- `HEADSCALE_DEBUG_HIGH_CARDINALITY_METRICS`: make certain metrics include per node.id, not recommended to use in prod.
- `HEADSCALE_DEBUG_PROFILING_ENABLED`: activate tracing
- `HEADSCALE_DEBUG_PROFILING_PATH`: where to store traces
- `HEADSCALE_DEBUG_DUMP_CONFIG`: calls `spew.Dump` on the config object startup
- `HEADSCALE_DEBUG_DEADLOCK`: enable go-deadlock to dump goroutines if it looks like a deadlock has occured, enabled in integration tests.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit restructures the map session in to a struct
holding the state of what is needed during its lifetime.
For streaming sessions, the event loop is structured a
bit differently not hammering the clients with updates
but rather batching them over a short, configurable time
which should significantly improve cpu usage, and potentially
flakyness.
The use of Patch updates has been dialed back a little as
it does not look like its a 100% ready for prime time. Nodes
are now updated with full changes, except for a few things
like online status.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commits removes the locks used to guard data integrity for the
database and replaces them with Transactions, turns out that SQL had
a way to deal with this all along.
This reduces the complexity we had with multiple locks that might stack
or recurse (database, nofitifer, mapper). All notifications and state
updates are now triggered _after_ a database change.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There was a lot of tests that actually threw a lot of errors and that did
not pass all the way because we didnt check everything. This commit should
fix all of these cases.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This is step one in detaching the Database layer from Headscale (h). The
ultimate goal is to have all function that does database operations in
its own package, and keep the business logic and writing separate.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>