mirror of
https://github.com/juanfont/headscale.git
synced 2025-09-25 17:51:11 +02:00
CLAUDE.md: add project file, lets call it a test
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit is contained in:
parent
8c04ab7e44
commit
7177aebd37
395
CLAUDE.md
Normal file
395
CLAUDE.md
Normal file
@ -0,0 +1,395 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
Headscale is an open-source implementation of the Tailscale control server written in Go. It provides self-hosted coordination for Tailscale networks (tailnets), managing node registration, IP allocation, policy enforcement, and DERP routing.
|
||||
|
||||
## Development Commands
|
||||
|
||||
### Quick Setup
|
||||
```bash
|
||||
# Recommended: Use Nix for dependency management
|
||||
nix develop
|
||||
|
||||
# Full development workflow
|
||||
make dev # runs fmt + lint + test + build
|
||||
```
|
||||
|
||||
### Essential Commands
|
||||
```bash
|
||||
# Build headscale binary
|
||||
make build
|
||||
|
||||
# Run tests
|
||||
make test
|
||||
go test ./... # All unit tests
|
||||
go test -race ./... # With race detection
|
||||
|
||||
# Run specific integration test
|
||||
go run ./cmd/hi run "TestName" --postgres
|
||||
|
||||
# Code formatting and linting
|
||||
make fmt # Format all code (Go, docs, proto)
|
||||
make lint # Lint all code (Go, proto)
|
||||
make fmt-go # Format Go code only
|
||||
make lint-go # Lint Go code only
|
||||
|
||||
# Protocol buffer generation (after modifying proto/)
|
||||
make generate
|
||||
|
||||
# Clean build artifacts
|
||||
make clean
|
||||
```
|
||||
|
||||
### Integration Testing
|
||||
```bash
|
||||
# Use the hi (Headscale Integration) test runner
|
||||
go run ./cmd/hi doctor # Check system requirements
|
||||
go run ./cmd/hi run "TestPattern" # Run specific test
|
||||
go run ./cmd/hi run "TestPattern" --postgres # With PostgreSQL backend
|
||||
|
||||
# Test artifacts are saved to control_logs/ with logs and debug data
|
||||
```
|
||||
|
||||
## Project Structure & Architecture
|
||||
|
||||
### Top-Level Organization
|
||||
|
||||
```
|
||||
headscale/
|
||||
├── cmd/ # Command-line applications
|
||||
│ ├── headscale/ # Main headscale server binary
|
||||
│ └── hi/ # Headscale Integration test runner
|
||||
├── hscontrol/ # Core control plane logic
|
||||
├── integration/ # End-to-end Docker-based tests
|
||||
├── proto/ # Protocol buffer definitions
|
||||
├── gen/ # Generated code (protobuf)
|
||||
├── docs/ # Documentation
|
||||
└── packaging/ # Distribution packaging
|
||||
```
|
||||
|
||||
### Core Packages (`hscontrol/`)
|
||||
|
||||
**Main Server (`hscontrol/`)**
|
||||
- `app.go`: Application setup, dependency injection, server lifecycle
|
||||
- `handlers.go`: HTTP/gRPC API endpoints for management operations
|
||||
- `grpcv1.go`: gRPC service implementation for headscale API
|
||||
- `poll.go`: **Critical** - Handles Tailscale MapRequest/MapResponse protocol
|
||||
- `noise.go`: Noise protocol implementation for secure client communication
|
||||
- `auth.go`: Authentication flows (web, OIDC, command-line)
|
||||
- `oidc.go`: OpenID Connect integration for user authentication
|
||||
|
||||
**State Management (`hscontrol/state/`)**
|
||||
- `state.go`: Central coordinator for all subsystems (database, policy, IP allocation, DERP)
|
||||
- `node_store.go`: **Performance-critical** - In-memory cache with copy-on-write semantics
|
||||
- Thread-safe operations with deadlock detection
|
||||
- Coordinates between database persistence and real-time operations
|
||||
|
||||
**Database Layer (`hscontrol/db/`)**
|
||||
- `db.go`: Database abstraction, GORM setup, migration management
|
||||
- `node.go`: Node lifecycle, registration, expiration, IP assignment
|
||||
- `users.go`: User management, namespace isolation
|
||||
- `api_key.go`: API authentication tokens
|
||||
- `preauth_keys.go`: Pre-authentication keys for automated node registration
|
||||
- `ip.go`: IP address allocation and management
|
||||
- `policy.go`: Policy storage and retrieval
|
||||
- Schema migrations in `schema.sql` with extensive test data coverage
|
||||
|
||||
**Policy Engine (`hscontrol/policy/`)**
|
||||
- `policy.go`: Core ACL evaluation logic, HuJSON parsing
|
||||
- `v2/`: Next-generation policy system with improved filtering
|
||||
- `matcher/`: ACL rule matching and evaluation engine
|
||||
- Determines peer visibility, route approval, and network access rules
|
||||
- Supports both file-based and database-stored policies
|
||||
|
||||
**Network Management (`hscontrol/`)**
|
||||
- `derp/`: DERP (Designated Encrypted Relay for Packets) server implementation
|
||||
- NAT traversal when direct connections fail
|
||||
- Fallback relay for firewall-restricted environments
|
||||
- `mapper/`: Converts internal Headscale state to Tailscale's wire protocol format
|
||||
- `tail.go`: Tailscale-specific data structure generation
|
||||
- `routes/`: Subnet route management and primary route selection
|
||||
- `dns/`: DNS record management and MagicDNS implementation
|
||||
|
||||
**Utilities & Support (`hscontrol/`)**
|
||||
- `types/`: Core data structures, configuration, validation
|
||||
- `util/`: Helper functions for networking, DNS, key management
|
||||
- `templates/`: Client configuration templates (Apple, Windows, etc.)
|
||||
- `notifier/`: Event notification system for real-time updates
|
||||
- `metrics.go`: Prometheus metrics collection
|
||||
- `capver/`: Tailscale capability version management
|
||||
|
||||
### Key Subsystem Interactions
|
||||
|
||||
**Node Registration Flow**
|
||||
1. **Client Connection**: `noise.go` handles secure protocol handshake
|
||||
2. **Authentication**: `auth.go` validates credentials (web/OIDC/preauth)
|
||||
3. **State Creation**: `state.go` coordinates IP allocation via `db/ip.go`
|
||||
4. **Storage**: `db/node.go` persists node, `NodeStore` caches in memory
|
||||
5. **Network Setup**: `mapper/` generates initial Tailscale network map
|
||||
|
||||
**Ongoing Operations**
|
||||
1. **Poll Requests**: `poll.go` receives periodic client updates
|
||||
2. **State Updates**: `NodeStore` maintains real-time node information
|
||||
3. **Policy Application**: `policy/` evaluates ACL rules for peer relationships
|
||||
4. **Map Distribution**: `mapper/` sends network topology to all affected clients
|
||||
|
||||
**Route Management**
|
||||
1. **Advertisement**: Clients announce routes via `poll.go` Hostinfo updates
|
||||
2. **Storage**: `db/` persists routes, `NodeStore` caches for performance
|
||||
3. **Approval**: `policy/` auto-approves routes based on ACL rules
|
||||
4. **Distribution**: `routes/` selects primary routes, `mapper/` distributes to peers
|
||||
|
||||
### Command-Line Tools (`cmd/`)
|
||||
|
||||
**Main Server (`cmd/headscale/`)**
|
||||
- `headscale.go`: CLI parsing, configuration loading, server startup
|
||||
- Supports daemon mode, CLI operations (user/node management), database operations
|
||||
|
||||
**Integration Test Runner (`cmd/hi/`)**
|
||||
- `main.go`: Test execution framework with Docker orchestration
|
||||
- `run.go`: Individual test execution with artifact collection
|
||||
- `doctor.go`: System requirements validation
|
||||
- `docker.go`: Container lifecycle management
|
||||
- Essential for validating changes against real Tailscale clients
|
||||
|
||||
### Generated & External Code
|
||||
|
||||
**Protocol Buffers (`proto/` → `gen/`)**
|
||||
- Defines gRPC API for headscale management operations
|
||||
- Client libraries can generate from these definitions
|
||||
- Run `make generate` after modifying `.proto` files
|
||||
|
||||
**Integration Testing (`integration/`)**
|
||||
- `scenario.go`: Docker test environment setup
|
||||
- `tailscale.go`: Tailscale client container management
|
||||
- Individual test files for specific functionality areas
|
||||
- Real end-to-end validation with network isolation
|
||||
|
||||
### Critical Performance Paths
|
||||
|
||||
**High-Frequency Operations**
|
||||
1. **MapRequest Processing** (`poll.go`): Every 15-60 seconds per client
|
||||
2. **NodeStore Reads** (`node_store.go`): Every operation requiring node data
|
||||
3. **Policy Evaluation** (`policy/`): On every peer relationship calculation
|
||||
4. **Route Lookups** (`routes/`): During network map generation
|
||||
|
||||
**Database Write Patterns**
|
||||
- **Frequent**: Node heartbeats, endpoint updates, route changes
|
||||
- **Moderate**: User operations, policy updates, API key management
|
||||
- **Rare**: Schema migrations, bulk operations
|
||||
|
||||
### Configuration & Deployment
|
||||
|
||||
**Configuration** (`hscontrol/types/config.go`)**
|
||||
- Database connection settings (SQLite/PostgreSQL)
|
||||
- Network configuration (IP ranges, DNS settings)
|
||||
- Policy mode (file vs database)
|
||||
- DERP relay configuration
|
||||
- OIDC provider settings
|
||||
|
||||
**Key Dependencies**
|
||||
- **GORM**: Database ORM with migration support
|
||||
- **Tailscale Libraries**: Core networking and protocol code
|
||||
- **Zerolog**: Structured logging throughout the application
|
||||
- **Buf**: Protocol buffer toolchain for code generation
|
||||
|
||||
### Development Workflow Integration
|
||||
|
||||
The architecture supports incremental development:
|
||||
- **Unit Tests**: Focus on individual packages (`*_test.go` files)
|
||||
- **Integration Tests**: Validate cross-component interactions
|
||||
- **Database Tests**: Extensive migration and data integrity validation
|
||||
- **Policy Tests**: ACL rule evaluation and edge cases
|
||||
- **Performance Tests**: NodeStore and high-frequency operation validation
|
||||
|
||||
## Integration Test System
|
||||
|
||||
### Overview
|
||||
Integration tests use Docker containers running real Tailscale clients against a Headscale server. Tests validate end-to-end functionality including routing, ACLs, node lifecycle, and network coordination.
|
||||
|
||||
### Running Integration Tests
|
||||
|
||||
**System Requirements**
|
||||
```bash
|
||||
# Check if your system is ready
|
||||
go run ./cmd/hi doctor
|
||||
```
|
||||
This verifies Docker, Go, required images, and disk space.
|
||||
|
||||
**Test Execution Patterns**
|
||||
```bash
|
||||
# Run a single test (recommended for development)
|
||||
go run ./cmd/hi run "TestSubnetRouterMultiNetwork"
|
||||
|
||||
# Run with PostgreSQL backend (for database-heavy tests)
|
||||
go run ./cmd/hi run "TestExpireNode" --postgres
|
||||
|
||||
# Run multiple tests with pattern matching
|
||||
go run ./cmd/hi run "TestSubnet*"
|
||||
|
||||
# Run all integration tests (CI/full validation)
|
||||
go test ./integration -timeout 30m
|
||||
```
|
||||
|
||||
**Test Categories & Timing**
|
||||
- **Fast tests** (< 2 min): Basic functionality, CLI operations
|
||||
- **Medium tests** (2-5 min): Route management, ACL validation
|
||||
- **Slow tests** (5+ min): Node expiration, HA failover
|
||||
- **Long-running tests** (10+ min): `TestNodeOnlineStatus` (12 min duration)
|
||||
|
||||
### Test Infrastructure
|
||||
|
||||
**Docker Setup**
|
||||
- Headscale server container with configurable database backend
|
||||
- Multiple Tailscale client containers with different versions
|
||||
- Isolated networks per test scenario
|
||||
- Automatic cleanup after test completion
|
||||
|
||||
**Test Artifacts**
|
||||
All test runs save artifacts to `control_logs/TIMESTAMP-ID/`:
|
||||
```
|
||||
control_logs/20250713-213106-iajsux/
|
||||
├── hs-testname-abc123.stderr.log # Headscale server logs
|
||||
├── hs-testname-abc123.stdout.log
|
||||
├── hs-testname-abc123.db # Database snapshot
|
||||
├── hs-testname-abc123_metrics.txt # Prometheus metrics
|
||||
├── hs-testname-abc123-mapresponses/ # Protocol debug data
|
||||
├── ts-client-xyz789.stderr.log # Tailscale client logs
|
||||
├── ts-client-xyz789.stdout.log
|
||||
└── ts-client-xyz789_status.json # Client status dump
|
||||
```
|
||||
|
||||
### Test Development Guidelines
|
||||
|
||||
**Timing Considerations**
|
||||
Integration tests involve real network operations and Docker container lifecycle:
|
||||
|
||||
```go
|
||||
// ❌ Wrong: Immediate assertions after async operations
|
||||
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
|
||||
nodes, _ := headscale.ListNodes()
|
||||
require.Len(t, nodes[0].GetAvailableRoutes(), 1) // May fail due to timing
|
||||
|
||||
// ✅ Correct: Wait for async operations to complete
|
||||
client.Execute([]string{"tailscale", "set", "--advertise-routes=10.0.0.0/24"})
|
||||
require.EventuallyWithT(t, func(c *assert.CollectT) {
|
||||
nodes, err := headscale.ListNodes()
|
||||
assert.NoError(c, err)
|
||||
assert.Len(c, nodes[0].GetAvailableRoutes(), 1)
|
||||
}, 10*time.Second, 100*time.Millisecond, "route should be advertised")
|
||||
```
|
||||
|
||||
**Common Test Patterns**
|
||||
- **Route Advertisement**: Use `EventuallyWithT` for route propagation
|
||||
- **Node State Changes**: Wait for NodeStore synchronization
|
||||
- **ACL Policy Changes**: Allow time for policy recalculation
|
||||
- **Network Connectivity**: Use ping tests with retries
|
||||
|
||||
**Test Data Management**
|
||||
```go
|
||||
// Node identification: Don't assume array ordering
|
||||
expectedRoutes := map[string]string{"1": "10.33.0.0/16"}
|
||||
for _, node := range nodes {
|
||||
nodeIDStr := fmt.Sprintf("%d", node.GetId())
|
||||
if route, shouldHaveRoute := expectedRoutes[nodeIDStr]; shouldHaveRoute {
|
||||
// Test the node that should have the route
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Troubleshooting Integration Tests
|
||||
|
||||
**Common Failure Patterns**
|
||||
1. **Timing Issues**: Test assertions run before async operations complete
|
||||
- **Solution**: Use `EventuallyWithT` with appropriate timeouts
|
||||
- **Timeout Guidelines**: 3-5s for route operations, 10s for complex scenarios
|
||||
|
||||
2. **Infrastructure Problems**: Disk space, Docker issues, network conflicts
|
||||
- **Check**: `go run ./cmd/hi doctor` for system health
|
||||
- **Clean**: Remove old test containers and networks
|
||||
|
||||
3. **NodeStore Synchronization**: Tests expecting immediate data availability
|
||||
- **Key Points**: Route advertisements must propagate through poll requests
|
||||
- **Fix**: Wait for NodeStore updates after Hostinfo changes
|
||||
|
||||
4. **Database Backend Differences**: SQLite vs PostgreSQL behavior differences
|
||||
- **Use**: `--postgres` flag for database-intensive tests
|
||||
- **Note**: Some timing characteristics differ between backends
|
||||
|
||||
**Debugging Failed Tests**
|
||||
1. **Check test artifacts** in `control_logs/` for detailed logs
|
||||
2. **Examine MapResponse JSON** files for protocol-level debugging
|
||||
3. **Review Headscale stderr logs** for server-side error messages
|
||||
4. **Check Tailscale client status** for network-level issues
|
||||
|
||||
**Resource Management**
|
||||
- Tests require significant disk space (each run ~100MB of logs)
|
||||
- Docker containers are cleaned up automatically on success
|
||||
- Failed tests may leave containers running - clean manually if needed
|
||||
- Use `docker system prune` periodically to reclaim space
|
||||
|
||||
### Best Practices for Test Modifications
|
||||
|
||||
1. **Always test locally** before committing integration test changes
|
||||
2. **Use appropriate timeouts** - too short causes flaky tests, too long slows CI
|
||||
3. **Clean up properly** - ensure tests don't leave persistent state
|
||||
4. **Handle both success and failure paths** in test scenarios
|
||||
5. **Document timing requirements** for complex test scenarios
|
||||
|
||||
## NodeStore Implementation Details
|
||||
|
||||
**Key Insight from Recent Work**: The NodeStore is a critical performance optimization that caches node data in memory while ensuring consistency with the database. When working with route advertisements or node state changes:
|
||||
|
||||
1. **Timing Considerations**: Route advertisements need time to propagate from clients to server. Use `require.EventuallyWithT()` patterns in tests instead of immediate assertions.
|
||||
|
||||
2. **Synchronization Points**: NodeStore updates happen at specific points like `poll.go:420` after Hostinfo changes. Ensure these are maintained when modifying the polling logic.
|
||||
|
||||
3. **Peer Visibility**: The NodeStore's `peersFunc` determines which nodes are visible to each other. Policy-based filtering is separate from monitoring visibility - expired nodes should remain visible for debugging but marked as expired.
|
||||
|
||||
## Testing Guidelines
|
||||
|
||||
### Integration Test Patterns
|
||||
```go
|
||||
// Use EventuallyWithT for async operations
|
||||
require.EventuallyWithT(t, func(c *assert.CollectT) {
|
||||
nodes, err := headscale.ListNodes()
|
||||
assert.NoError(c, err)
|
||||
// Check expected state
|
||||
}, 10*time.Second, 100*time.Millisecond, "description")
|
||||
|
||||
// Node route checking by actual node properties, not array position
|
||||
var routeNode *v1.Node
|
||||
for _, node := range nodes {
|
||||
if nodeIDStr := fmt.Sprintf("%d", node.GetId()); expectedRoutes[nodeIDStr] != "" {
|
||||
routeNode = node
|
||||
break
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Running Problematic Tests
|
||||
- Some tests require significant time (e.g., `TestNodeOnlineStatus` runs for 12 minutes)
|
||||
- Infrastructure issues like disk space can cause test failures unrelated to code changes
|
||||
- Use `--postgres` flag when testing database-heavy scenarios
|
||||
|
||||
## Important Notes
|
||||
|
||||
- **Dependencies**: Use `nix develop` for consistent toolchain (Go, buf, protobuf tools, linting)
|
||||
- **Protocol Buffers**: Changes to `proto/` require `make generate` and should be committed separately
|
||||
- **Code Style**: Enforced via golangci-lint with golines (width 88) and gofumpt formatting
|
||||
- **Database**: Supports both SQLite (development) and PostgreSQL (production/testing)
|
||||
- **Integration Tests**: Require Docker and can consume significant disk space
|
||||
- **Performance**: NodeStore optimizations are critical for scale - be careful with changes to state management
|
||||
|
||||
## Debugging Integration Tests
|
||||
|
||||
Test artifacts are preserved in `control_logs/TIMESTAMP-ID/` including:
|
||||
- Headscale server logs (stderr/stdout)
|
||||
- Tailscale client logs and status
|
||||
- Database dumps and network captures
|
||||
- MapResponse JSON files for protocol debugging
|
||||
|
||||
When tests fail, check these artifacts first before assuming code issues.
|
Loading…
Reference in New Issue
Block a user