Stirling-PDF

mirror of https://github.com/Frooodle/Stirling-PDF.git synced 2026-05-01 23:16:31 +02:00

Author	SHA1	Message	Date
James Brunton	51f5345151	Inform AI engine which endpoints are disabled on the backend (#6251 ) # Description of Changes Have the Java send a list of enabled endpoints to the AI engine so it can intelligently respond to the user that the tool does exist but is disabled on the server so it can't acutally run the operation, instead of the current behaviour where it sends the API call back and then 503 errors because the execution fails when the URL is disabled. <img width="380" height="208" alt="image" src="https://github.com/user-attachments/assets/5842fb2e-2e55-45a5-8205-25515636daae" /> --------- Co-authored-by: EthanHealy01 <80844253+EthanHealy01@users.noreply.github.com>	2026-05-01 14:59:53 +00:00
James Brunton	5541dd666c	Flesh out RAG system (#6197 ) # Description of Changes Flesh out the RAG system and connect it to the PDF Question Agent so it can respond to questions about PDFs of an extremely large size. I'd expect lots more work will need to be done to finish off the RAG system to really be what we need, but this should be a reasonable start which will let us connect it to tools and have the ingestion mostly handled automatically. I'm leaving file deletion and proper file ID management to be done in a future PR. We also need to consider whether all tools should retrieve content exclusively via RAG, or whether it's beneficial to have tools sometimes fetch the direct content and other times fetch it from RAG. A diagram of the expected interaction is as follows: ```mermaid sequenceDiagram autonumber actor U as User participant FE as Frontend<br/>(ChatPanel) participant J as Java<br/>(AiWorkflowService) participant O as Engine:<br/>OrchestratorAgent participant QA as Engine:<br/>PdfQuestionAgent participant RAG as Engine:<br/>RagService + SqliteVecStore participant V as VoyageAI<br/>(embeddings) participant L as LLM<br/>(Claude / etc.) U->>FE: types "Summarise this PDF"<br/>(PDF already uploaded) FE->>J: POST /api/v1/ai/orchestrate/stream<br/>multipart: fileInputs[], userMessage Note over J: ByteHashFileIdStrategy<br/>id = sha256(bytes)[:16] J->>O: POST /api/v1/orchestrator<br/>{ files:[{id,name}], userMessage } O->>L: route via fast model L-->>O: delegate_pdf_question O->>QA: PdfQuestionRequest loop for each file QA->>RAG: has_collection(file.id) RAG-->>QA: false end QA-->>O: NeedIngestResponse(files_to_ingest) O-->>J: { outcome:"need_ingest", filesToIngest:[...] } Note over J: onNeedIngest loop per file J->>J: PDFBox: extract page text J->>O: POST /api/v1/rag/documents<br/>(long-running timeout) O->>RAG: chunk + stage documents O->>V: embed_documents (batches of 256) V-->>O: embeddings O->>RAG: add_documents O-->>J: { chunks_indexed: N } end Note over J: retry with resumeWith=pdf_question J->>O: POST /api/v1/orchestrator Note over O: fast-path to PdfQuestionAgent O->>QA: PdfQuestionRequest Note over QA: build RagCapability<br/>pinned to file IDs QA->>L: run(prompt) with search_knowledge tool loop up to max_searches L->>QA: search_knowledge(query) QA->>V: embed_query V-->>QA: query vector QA->>RAG: search(vector, collections=[file.id]) RAG-->>QA: top-k chunks QA-->>L: formatted chunks end Note over QA: once budget spent,<br/>prepare() hides the tool L-->>QA: PdfQuestionAnswerResponse QA-->>O: answer O-->>J: { outcome:"answer", answer, evidence } J-->>FE: SSE "result" FE->>U: assistant bubble ```	2026-05-01 14:11:54 +01:00
ConnorYoh	86774d556e	Pdf comment agent (#6196 ) Co-authored-by: James Brunton <jbrunton96@gmail.com>	2026-05-01 10:19:38 +01:00
James Brunton	3e94157137	Add document context for edit agent (#6152 ) # Description of Changes Adds the ability for the Edit agent to request the content of the document before it decides which parameters it needs. This makes it able to process requests like `Split the document after the page containing the "My Section" section`, allowing for document context-based requests for all[^1] tools. I had to make a few changes elsewhere to make this work, including: - Moving the requesting of content out of the Question Agent and into a common location - Added specific API docs for the Split param because the generic ones were not specific enough for the AI to be able to reliably perform the correct operation - Fixed an issue in the tool models generator which caused the Redact params to only be half-generated (causing Pydantic to crash when the AI tried to run Redact) - Added missing logging to a bunch of tools and hooked it up properly so it'll print to stderr - Made the limits for the max pages/chars to extract from PDFs configurable via env var [^1]: Many of the tools can't actually do anything useful with the context at this stage, but will just need the tool API to be extended with new features like page-specific operations to be automatically able to do smart operations without needing to change the Edit agent itself.	2026-04-23 13:19:27 +00:00
James Brunton	2a856fbc19	Allow chat history to be sent to AI engine (#6128 ) # Description of Changes Add an extra parameter to every agent to receive the conversation history in addition to the current message. This will make it possible to answer followup questions from the AI without needing to give full context in your message.	2026-04-21 15:03:10 +00:00
Anthony Stirling	f779085d75	setup RAG (#6146 )	2026-04-21 12:42:33 +01:00
James Brunton	e5767ed58b	Change AI engine to execute tools in Java instead of on frontend (#6116 ) # Description of Changes Redesign AI engine so that it autogenerates the `tool_models.py` file from the OpenAPI spec so the Python has access to the Java API parameters and the full list of Java tools that it can run. CI ensures that whenever someone modifies a tool endpoint that the AI enigne tool models get updated as well (the dev gets told to run `task engine:tool-models`). There's loads of advantages to having the Java be the one that actually executes the tools, rather than the frontend as it was previously set up to theoretically use: - The AI gets much better descriptions of the params from the API docs - It'll be usable headless in the future so a Java daemon could run to execute ops on files in a folder without the need for the UI to run - The Java already has all the logic it needs to execute the tools - We don't need to parse the TypeScript to find the API (which is hard because the TS wasn't designed to be computer-read to extract the API) I've also hooked up the prototype frontend to ensure it's working properly, and have built it in a way that all the tool names can be translated properly, which was always an issue with previous prototypes of this. --------- Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com> Co-authored-by: EthanHealy01 <80844253+EthanHealy01@users.noreply.github.com>	2026-04-20 15:57:11 +01:00
ConnorYoh	de8c483054	Feat/math validation agent (#6012 ) Co-authored-by: James Brunton <jbrunton96@gmail.com> Co-authored-by: EthanHealy01 <80844253+EthanHealy01@users.noreply.github.com>	2026-04-17 10:36:45 +01:00
James Brunton	2bf5f0b18e	Add tracking system to support optional PostHog tracking in AI engine (#6040 ) Co-authored-by: ConnorYoh <40631091+ConnorYoh@users.noreply.github.com>	2026-04-14 18:45:47 +01:00
James Brunton	b130242688	Add Java orchestrator to connect to the AI engine (#6003 ) # Description of Changes Add Java orchestration layer which can connect and go back and forth with the AI engine to get results for the user. It's expected that the AI engine will not be publicly available and this Java layer will always be in front of it, to manage sessions and auth etc.	2026-04-09 08:04:38 +00:00
James Brunton	e10c5f6283	Redesign Python AI engine (#5991 ) # Description of Changes Redesign the Python AI engine to be properly agentic and make use of `pydantic-ai` instead of `langchain` for correctness and ergonomics. This should be a good foundation for us to build our AI engine on going forwards.	2026-03-26 10:35:47 +00:00
James Brunton	c58a6092ec	Add SaaS AI engine (#5907 )	2026-03-16 11:01:50 +00:00

12 Commits