# Description of Changes
Have the Java send a list of enabled endpoints to the AI engine so it
can intelligently respond to the user that the tool does exist but is
disabled on the server so it can't acutally run the operation, instead
of the current behaviour where it sends the API call back and then 503
errors because the execution fails when the URL is disabled.
<img width="380" height="208" alt="image"
src="https://github.com/user-attachments/assets/5842fb2e-2e55-45a5-8205-25515636daae"
/>
---------
Co-authored-by: EthanHealy01 <80844253+EthanHealy01@users.noreply.github.com>
# Description of Changes
Flesh out the RAG system and connect it to the PDF Question Agent so it
can respond to questions about PDFs of an extremely large size.
I'd expect lots more work will need to be done to finish off the RAG
system to really be what we need, but this should be a reasonable start
which will let us connect it to tools and have the ingestion mostly
handled automatically. I'm leaving file deletion and proper file ID
management to be done in a future PR. We also need to consider whether
all tools should retrieve content exclusively via RAG, or whether it's
beneficial to have tools sometimes fetch the direct content and other
times fetch it from RAG.
A diagram of the expected interaction is as follows:
```mermaid
sequenceDiagram
autonumber
actor U as User
participant FE as Frontend<br/>(ChatPanel)
participant J as Java<br/>(AiWorkflowService)
participant O as Engine:<br/>OrchestratorAgent
participant QA as Engine:<br/>PdfQuestionAgent
participant RAG as Engine:<br/>RagService + SqliteVecStore
participant V as VoyageAI<br/>(embeddings)
participant L as LLM<br/>(Claude / etc.)
U->>FE: types "Summarise this PDF"<br/>(PDF already uploaded)
FE->>J: POST /api/v1/ai/orchestrate/stream<br/>multipart: fileInputs[], userMessage
Note over J: ByteHashFileIdStrategy<br/>id = sha256(bytes)[:16]
J->>O: POST /api/v1/orchestrator<br/>{ files:[{id,name}], userMessage }
O->>L: route via fast model
L-->>O: delegate_pdf_question
O->>QA: PdfQuestionRequest
loop for each file
QA->>RAG: has_collection(file.id)
RAG-->>QA: false
end
QA-->>O: NeedIngestResponse(files_to_ingest)
O-->>J: { outcome:"need_ingest", filesToIngest:[...] }
Note over J: onNeedIngest
loop per file
J->>J: PDFBox: extract page text
J->>O: POST /api/v1/rag/documents<br/>(long-running timeout)
O->>RAG: chunk + stage documents
O->>V: embed_documents (batches of 256)
V-->>O: embeddings
O->>RAG: add_documents
O-->>J: { chunks_indexed: N }
end
Note over J: retry with resumeWith=pdf_question
J->>O: POST /api/v1/orchestrator
Note over O: fast-path to PdfQuestionAgent
O->>QA: PdfQuestionRequest
Note over QA: build RagCapability<br/>pinned to file IDs
QA->>L: run(prompt) with search_knowledge tool
loop up to max_searches
L->>QA: search_knowledge(query)
QA->>V: embed_query
V-->>QA: query vector
QA->>RAG: search(vector, collections=[file.id])
RAG-->>QA: top-k chunks
QA-->>L: formatted chunks
end
Note over QA: once budget spent,<br/>prepare() hides the tool
L-->>QA: PdfQuestionAnswerResponse
QA-->>O: answer
O-->>J: { outcome:"answer", answer, evidence }
J-->>FE: SSE "result"
FE->>U: assistant bubble
```
# Description of Changes
Adds the ability for the Edit agent to request the content of the
document before it decides which parameters it needs. This makes it able
to process requests like `Split the document after the page containing
the "My Section" section`, allowing for document context-based requests
for all[^1] tools.
I had to make a few changes elsewhere to make this work, including:
- Moving the requesting of content out of the Question Agent and into a
common location
- Added specific API docs for the Split param because the generic ones
were not specific enough for the AI to be able to reliably perform the
correct operation
- Fixed an issue in the tool models generator which caused the Redact
params to only be half-generated (causing Pydantic to crash when the AI
tried to run Redact)
- Added missing logging to a bunch of tools and hooked it up properly so
it'll print to stderr
- Made the limits for the max pages/chars to extract from PDFs
configurable via env var
[^1]: Many of the tools can't actually do anything useful with the
context at this stage, but will just need the tool API to be extended
with new features like page-specific operations to be automatically able
to do smart operations without needing to change the Edit agent itself.
# Description of Changes
Add an extra parameter to every agent to receive the conversation
history in addition to the current message. This will make it possible
to answer followup questions from the AI without needing to give full
context in your message.
# Description of Changes
Redesign AI engine so that it autogenerates the `tool_models.py` file
from the OpenAPI spec so the Python has access to the Java API
parameters and the full list of Java tools that it can run. CI ensures
that whenever someone modifies a tool endpoint that the AI enigne tool
models get updated as well (the dev gets told to run `task
engine:tool-models`).
There's loads of advantages to having the Java be the one that actually
executes the tools, rather than the frontend as it was previously set up
to theoretically use:
- The AI gets much better descriptions of the params from the API docs
- It'll be usable headless in the future so a Java daemon could run to
execute ops on files in a folder without the need for the UI to run
- The Java already has all the logic it needs to execute the tools
- We don't need to parse the TypeScript to find the API (which is hard
because the TS wasn't designed to be computer-read to extract the API)
I've also hooked up the prototype frontend to ensure it's working
properly, and have built it in a way that all the tool names can be
translated properly, which was always an issue with previous prototypes
of this.
---------
Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>
Co-authored-by: EthanHealy01 <80844253+EthanHealy01@users.noreply.github.com>
# Description of Changes
Add Java orchestration layer which can connect and go back and forth
with the AI engine to get results for the user. It's expected that the
AI engine will not be publicly available and this Java layer will always
be in front of it, to manage sessions and auth etc.
# Description of Changes
Redesign the Python AI engine to be properly agentic and make use of
`pydantic-ai` instead of `langchain` for correctness and ergonomics.
This should be a good foundation for us to build our AI engine on going
forwards.