When the node notifier was replaced with batcher, we removed
its closing, but forgot to add the batchers so it was never
stopping node connections and waiting forever.
Fixes#2751
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
the client will send a lot of fields as `nil` if they have
not changed. NetInfo, which is inside Hostinfo, is one of those
fields and we often would override the whole hostinfo meaning that
we would remove netinfo if it hadnt changed.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.
It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.
Writes will block until commited, while reads are never
blocked.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Before this patch, we would send a message to each "node stream"
that there is an update that needs to be turned into a mapresponse
and sent to a node.
Producing the mapresponse is a "costly" afair which means that while
a node was producing one, it might start blocking and creating full
queues from the poller and all the way up to where updates where sent.
This could cause updates to time out and being dropped as a bad node
going away or spending too time processing would cause all the other
nodes to not get any updates.
In addition, it contributed to "uncontrolled parallel processing" by
potentially doing too many expensive operations at the same time:
Each node stream is essentially a channel, meaning that if you have 30
nodes, we will try to process 30 map requests at the same time. If you
have 8 cpu cores, that will saturate all the cores immediately and cause
a lot of wasted switching between the processing.
Now, all the maps are processed by workers in the mapper, and the number
of workers are controlable. These would now be recommended to be a bit
less than number of CPU cores, allowing us to process them as fast as we
can, and then send them to the poll.
When the poll recieved the map, it is only responsible for taking it and
sending it to the node.
This might not directly improve the performance of Headscale, but it will
likely make the performance a lot more consistent. And I would argue the
design is a lot easier to reason about.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Previously, nil regions were not properly handled. This change allows users to disable regions in DERPMaps.
Particularly useful to disable some official regions.
This patch includes some changes to the OIDC integration in particular:
- Make sure that userinfo claims are queried *before* comparing the
user with the configured allowed groups, email and email domain.
- Update user with group claim from the userinfo endpoint which is
required for allowed groups to work correctly. This is essentially a
continuation of #2545.
- Let userinfo claims take precedence over id token claims.
With these changes I have verified that Headscale works as expected
together with Authelia without the documented escape hatch [0], i.e.
everything works even if the id token only contain the iss and sub
claims.
[0]: https://www.authelia.com/integration/openid-connect/headscale/#configuration-escape-hatch
There was a bug in HA subnet router handover where we used stale node data
from the longpoll session that we handed to Connect. This meant that we got
some odd behaviour where routes would not be deactivated correctly.
This commit changes to the nodeview is used through out, and we load the
current node to be updated in the write path and then handle it all there
to be consistent.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit changes most of our (*)types.Node to
types.NodeView, which is a readonly version of the
underlying node ensuring that there is no mutations
happening in the read path.
Based on the migration, there didnt seem to be any, but the
idea here is to prevent it in the future and simplify other
new implementations.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.
The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.
Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* feat: add verify client config for embedded DERP
* refactor: embedded DERP no longer verify clients via HTTP
- register the `headscale://` protocol in `http.DefaultTransport` to intercept network requests
- update configuration to use a single boolean option `verify_clients`
* refactor: use `http.HandlerFunc` for type definition
* refactor: some renaming and restructuring
* chore: some renaming and fix lint
* test: fix TestDERPVerifyEndpoint
- `tailscale debug derp` use random node private key
* test: add verify clients integration test for embedded DERP server
* fix: apply code review suggestions
* chore: merge upstream changes
* fix: apply code review suggestions
---------
Co-authored-by: Kristoffer Dalby <kristoffer@dalby.cc>
* Improve map auth logic
* Bugfix
* Add comment, improve error message
* noise: make func, get by node
this commit splits the additional validation into a
separate function so it can be reused if we add more
endpoints in the future.
It swaps the check, so we still look up by NodeKey, but before
accepting the connection, we validate the known machinekey from
the db against the noise connection.
The reason for this is that when a node logs in or out, the node key
is replaced and it will no longer be possible to look it up, breaking
reauthentication.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* noise: add comment to remind future use of getAndVal
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
* changelog: add entry
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
---------
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Co-authored-by: Kristoffer Dalby <kristoffer@tailscale.com>
This change makes editing the generated command easier.
For example, after pasting into a terminal, the cursor position will be
near the username portion which requires editing.