mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2026-05-01 23:16:31 +02:00
# Description of Changes
Flesh out the RAG system and connect it to the PDF Question Agent so it
can respond to questions about PDFs of an extremely large size.
I'd expect lots more work will need to be done to finish off the RAG
system to really be what we need, but this should be a reasonable start
which will let us connect it to tools and have the ingestion mostly
handled automatically. I'm leaving file deletion and proper file ID
management to be done in a future PR. We also need to consider whether
all tools should retrieve content exclusively via RAG, or whether it's
beneficial to have tools sometimes fetch the direct content and other
times fetch it from RAG.
A diagram of the expected interaction is as follows:
```mermaid
sequenceDiagram
autonumber
actor U as User
participant FE as Frontend<br/>(ChatPanel)
participant J as Java<br/>(AiWorkflowService)
participant O as Engine:<br/>OrchestratorAgent
participant QA as Engine:<br/>PdfQuestionAgent
participant RAG as Engine:<br/>RagService + SqliteVecStore
participant V as VoyageAI<br/>(embeddings)
participant L as LLM<br/>(Claude / etc.)
U->>FE: types "Summarise this PDF"<br/>(PDF already uploaded)
FE->>J: POST /api/v1/ai/orchestrate/stream<br/>multipart: fileInputs[], userMessage
Note over J: ByteHashFileIdStrategy<br/>id = sha256(bytes)[:16]
J->>O: POST /api/v1/orchestrator<br/>{ files:[{id,name}], userMessage }
O->>L: route via fast model
L-->>O: delegate_pdf_question
O->>QA: PdfQuestionRequest
loop for each file
QA->>RAG: has_collection(file.id)
RAG-->>QA: false
end
QA-->>O: NeedIngestResponse(files_to_ingest)
O-->>J: { outcome:"need_ingest", filesToIngest:[...] }
Note over J: onNeedIngest
loop per file
J->>J: PDFBox: extract page text
J->>O: POST /api/v1/rag/documents<br/>(long-running timeout)
O->>RAG: chunk + stage documents
O->>V: embed_documents (batches of 256)
V-->>O: embeddings
O->>RAG: add_documents
O-->>J: { chunks_indexed: N }
end
Note over J: retry with resumeWith=pdf_question
J->>O: POST /api/v1/orchestrator
Note over O: fast-path to PdfQuestionAgent
O->>QA: PdfQuestionRequest
Note over QA: build RagCapability<br/>pinned to file IDs
QA->>L: run(prompt) with search_knowledge tool
loop up to max_searches
L->>QA: search_knowledge(query)
QA->>V: embed_query
V-->>QA: query vector
QA->>RAG: search(vector, collections=[file.id])
RAG-->>QA: top-k chunks
QA-->>L: formatted chunks
end
Note over QA: once budget spent,<br/>prepare() hides the tool
L-->>QA: PdfQuestionAnswerResponse
QA-->>O: answer
O-->>J: { outcome:"answer", answer, evidence }
J-->>FE: SSE "result"
FE->>U: assistant bubble
```
54 lines
2.3 KiB
Bash
54 lines
2.3 KiB
Bash
###############################################################################
|
|
# Environment variables used within the AI Engine.
|
|
# Values can be overridden in the uncommitted sibling `.env.local` file.
|
|
# Note: This file is committed to Git, so should not contain any private keys.
|
|
###############################################################################
|
|
|
|
# Configure the model strings passed to pydantic-ai. Provider credentials are handled by
|
|
# pydantic-ai and should be set using the provider's native environment variables, for example
|
|
# ANTHROPIC_API_KEY or OPENAI_API_KEY.
|
|
STIRLING_SMART_MODEL=anthropic:claude-haiku-4-5
|
|
STIRLING_FAST_MODEL=anthropic:claude-haiku-4-5
|
|
|
|
# Default output token limits applied by the engine for each model tier.
|
|
STIRLING_SMART_MODEL_MAX_TOKENS=8192
|
|
STIRLING_FAST_MODEL_MAX_TOKENS=2048
|
|
|
|
# RAG Configuration — retrieval-augmented generation is always on.
|
|
# Embedding provider credentials are handled natively (e.g. VOYAGE_API_KEY for VoyageAI).
|
|
STIRLING_RAG_EMBEDDING_MODEL=voyageai:voyage-4
|
|
|
|
# Vector store backend: "sqlite" (embedded) or "pgvector" (external Postgres).
|
|
STIRLING_RAG_BACKEND=sqlite
|
|
|
|
# Path to the sqlite-vec database file (used when backend=sqlite).
|
|
STIRLING_RAG_STORE_PATH=data/rag.db
|
|
|
|
# Postgres DSN for pgvector (used when backend=pgvector). Leave empty when backend=sqlite.
|
|
# Example: postgresql://user:password@host:5432/dbname
|
|
STIRLING_RAG_PGVECTOR_DSN=
|
|
|
|
STIRLING_RAG_CHUNK_SIZE=512
|
|
STIRLING_RAG_CHUNK_OVERLAP=64
|
|
STIRLING_RAG_TOP_K=20
|
|
|
|
# Per-run cap on ``search_knowledge`` calls. After this many calls the tool is
|
|
# removed from the agent's toolset so it must answer from what it already retrieved
|
|
# rather than chain more searches.
|
|
STIRLING_RAG_MAX_SEARCHES=5
|
|
|
|
# Upper bounds on PDF page text the engine will request per extraction round.
|
|
STIRLING_MAX_PAGES=200
|
|
STIRLING_MAX_CHARACTERS=200000
|
|
|
|
# PostHog analytics. Set STIRLING_POSTHOG_ENABLED=true and provide an API key to enable.
|
|
STIRLING_POSTHOG_ENABLED=false
|
|
STIRLING_POSTHOG_API_KEY=phc_VOdeYnlevc2T63m3myFGjeBlRcIusRgmhfx6XL5a1iz
|
|
STIRLING_POSTHOG_HOST=https://eu.i.posthog.com
|
|
|
|
# Log level for the stirling logger hierarchy (DEBUG, INFO, WARNING, ERROR)
|
|
STIRLING_LOG_LEVEL=INFO
|
|
|
|
# Path to log file. Rolls daily, keeps 1 backup. Leave empty for console only.
|
|
STIRLING_LOG_FILE=
|