Update claude.md

This commit is contained in:
Reece 2025-07-14 18:14:36 +01:00
parent 3d0d479ad7
commit 5d7c572929

View File

@ -25,23 +25,54 @@ Set `DOCKER_ENABLE_SECURITY=true` environment variable to enable security featur
- **Proxy Configuration**: Vite proxies `/api/*` calls to backend (localhost:8080)
- **Build Process**: DO NOT run build scripts manually - builds are handled by CI/CD pipelines
- **Package Installation**: DO NOT run npm install commands - package management handled separately
- **Deployment Options**:
- **Desktop App**: `npm run tauri-build` (native desktop application)
- **Web Server**: `npm run build` then serve dist/ folder
- **Development**: `npm run tauri-dev` for desktop dev mode
#### Tailwind CSS Setup (if not already installed)
```bash
cd frontend
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p
```
#### Multi-Tool Workflow Architecture
Frontend designed for **stateful document processing**:
- Users upload PDFs once, then chain tools (split → merge → compress → view)
- File state and processing results persist across tool switches
- No file reloading between tools - performance critical for large PDFs (up to 100GB+)
#### FileContext - Central State Management
**Location**: `src/contexts/FileContext.tsx`
- **Active files**: Currently loaded PDFs and their variants
- **Tool navigation**: Current mode (viewer/pageEditor/fileEditor/toolName)
- **Memory management**: PDF document cleanup, blob URL lifecycle, Web Worker management
- **IndexedDB persistence**: File storage with thumbnail caching
- **Preview system**: Tools can preview results (e.g., Split → Viewer → back to Split) without context pollution
**Critical**: All file operations go through FileContext. Don't bypass with direct file handling.
#### Processing Services
- **enhancedPDFProcessingService**: Background PDF parsing and manipulation
- **thumbnailGenerationService**: Web Worker-based with main-thread fallback
- **fileStorage**: IndexedDB with LRU cache management
#### Memory Management Strategy
**Why manual cleanup exists**: Large PDFs (up to 100GB+) through multiple tools accumulate:
- PDF.js documents that need explicit .destroy() calls
- Blob URLs from tool outputs that need revocation
- Web Workers that need termination
Without cleanup: browser crashes with memory leaks.
#### Tool Development
- **Pattern**: Follow `src/tools/Split.tsx` as reference implementation
- **File Access**: Tools receive `selectedFiles` prop (computed from activeFiles based on user selection)
- **File Selection**: Users select files in FileEditor (tool mode) → stored as IDs → computed to File objects for tools
- **Integration**: All files are part of FileContext ecosystem - automatic memory management and operation tracking
- **Parameters**: Tool parameter handling patterns still being standardized
- **Preview Integration**: Tools can implement preview functionality (see Split tool's thumbnail preview)
## Architecture Overview
### Project Structure
- **Backend**: Spring Boot application with Thymeleaf templating
- **Frontend**: React-based SPA in `/frontend` directory (replacing legacy Thymeleaf templates)
- **Current Status**: Active development to replace Thymeleaf UI with modern React SPA
- **Frontend**: React-based SPA in `/frontend` directory (Thymeleaf templates fully replaced)
- **File Storage**: IndexedDB for client-side file persistence and thumbnails
- **Internationalization**: JSON-based translations (converted from backend .properties)
- **URL Parameters**: Deep linking support for tool states and configurations
- **PDF Processing**: PDFBox for core PDF operations, LibreOffice for conversions, PDF.js for client-side rendering
- **Security**: Spring Security with optional authentication (controlled by `DOCKER_ENABLE_SECURITY`)
- **Configuration**: YAML-based configuration with environment variable overrides
@ -59,9 +90,8 @@ npx tailwindcss init -p
- **Pipeline System**: Automated PDF processing workflows via `PipelineController`
- **Security Layer**: Authentication, authorization, and user management (when enabled)
### Template System (Legacy + Modern)
- **Legacy Thymeleaf Templates**: Located in `src/main/resources/templates/` (being phased out)
- **Modern React Components**: Located in `frontend/src/components/` and `frontend/src/tools/`
### Component Architecture
- **React Components**: Located in `frontend/src/components/` and `frontend/src/tools/`
- **Static Assets**: CSS, JS, and resources in `src/main/resources/static/` (legacy) + `frontend/public/` (modern)
- **Internationalization**:
- Backend: `messages_*.properties` files
@ -91,13 +121,14 @@ npx tailwindcss init -p
- Frontend: Update JSON files in `frontend/public/locales/` or use conversion script
5. **Documentation**: API docs auto-generated and available at `/swagger-ui/index.html`
## Frontend Migration Notes
## Frontend Architecture Status
- **Current Branch**: `feature/react-overhaul` - Active React SPA development
- **Migration Status**: Core tools (Split, Merge, Compress) converted to React with URL parameter support
- **File Management**: Implemented IndexedDB storage with thumbnail generation using PDF.js
- **Tools Architecture**: Each tool receives `params` and `updateParams` for URL state synchronization
- **Remaining Work**: Convert remaining Thymeleaf templates to React components
- **Core Status**: React SPA architecture complete with multi-tool workflow support
- **State Management**: FileContext handles all file operations and tool navigation
- **File Processing**: Production-ready with memory management for large PDF workflows (up to 100GB+)
- **Tool Integration**: Standardized tool interface - see `src/tools/Split.tsx` as reference
- **Preview System**: Tool results can be previewed without polluting file context (Split tool example)
- **Performance**: Web Worker thumbnails, IndexedDB persistence, background processing
## Important Notes
@ -108,6 +139,11 @@ npx tailwindcss init -p
- **Backend**: Designed to be stateless - files are processed in memory/temp locations only
- **Frontend**: Uses IndexedDB for client-side file storage and caching (with thumbnails)
- **Security**: When `DOCKER_ENABLE_SECURITY=false`, security-related classes are excluded from compilation
- **FileContext**: All file operations MUST go through FileContext - never bypass with direct File handling
- **Memory Management**: Manual cleanup required for PDF.js documents and blob URLs - don't remove cleanup code
- **Tool Development**: New tools should follow Split tool pattern (`src/tools/Split.tsx`)
- **Performance Target**: Must handle PDFs up to 100GB+ without browser crashes
- **Preview System**: Tools can preview results without polluting main file context (see Split tool implementation)
## Communication Style
- Be direct and to the point