Files
blakeblackshear.frigate/frigate/genai/__init__.py
Nicolas Mowen d24b96d3bb Early 0.18 work (#22138)
* Update version

* Create scaffolding for case management (#21293)

* implement case management for export apis (#21295)

* refactor vainfo to search for first GPU (#21296)

use existing LibvaGpuSelector to pick appropritate libva device

* Case management UI (#21299)

* Refactor export cards to match existing cards in other UI pages

* Show cases separately from exports

* Add proper filtering and display of cases

* Add ability to edit and select cases for exports

* Cleanup typing

* Hide if no unassigned

* Cleanup hiding logic

* fix scrolling

* Improve layout

* Camera connection quality indicator (#21297)

* add camera connection quality metrics and indicator

* formatting

* move stall calcs to watchdog

* clean up

* change watchdog to 1s and separately track time for ffmpeg retry_interval

* implement status caching to reduce message volume

* Export filter UI (#21322)

* Get started on export filters

* implement basic filter

* Implement filtering and adjust api

* Improve filter handling

* Improve navigation

* Cleanup

* handle scrolling

* Refactor temperature reporting for detectors and implement Hailo temp reading (#21395)

* Add Hailo temperature retrieval

* Refactor `get_hailo_temps()` to use ctxmanager

* Show Hailo temps in system UI

* Move hailo_platform import to get_hailo_temps

* Refactor temperatures calculations to use within detector block

* Adjust webUI to handle new location

---------

Co-authored-by: tigattack <10629864+tigattack@users.noreply.github.com>

* Camera-specific hwaccel settings for timelapse exports (correct base) (#21386)

* added hwaccel_args to camera.record.export config struct

* populate camera.record.export.hwaccel_args with a cascade up to camera then global if 'auto'

* use new hwaccel args in export

* added documentation for camera-specific hwaccel export

* fix c/p error

* missed an import

* fleshed out the docs and comments a bit

* ruff lint

* separated out the tips in the doc

* fix documentation

* fix and simplify reference config doc

* Add support for GPU and NPU temperatures (#21495)

* Add rockchip temps

* Add support for GPU and NPU temperatures in the frontend

* Add support for Nvidia temperature

* Improve separation

* Adjust graph scaling

* Exports Improvements (#21521)

* Add images to case folder view

* Add ability to select case in export dialog

* Add to mobile review too

* Add API to handle deleting recordings  (#21520)

* Add recording delete API

* Re-organize recordings apis

* Fix import

* Consolidate query types

* Add media sync API endpoint (#21526)

* add media cleanup functions

* add endpoint

* remove scheduled sync recordings from cleanup

* move to utils dir

* tweak import

* remove sync_recordings and add config migrator

* remove sync_recordings

* docs

* remove key

* clean up docs

* docs fix

* docs tweak

* Media sync API refactor and UI (#21542)

* generic job infrastructure

* types and dispatcher changes for jobs

* save data in memory only for completed jobs

* implement media sync job and endpoints

* change logs to debug

* websocket hook and types

* frontend

* i18n

* docs tweaks

* endpoint descriptions

* tweak docs

* use same logging pattern in sync_recordings as the other sync functions (#21625)

* Fix incorrect counting in sync_recordings (#21626)

* Update go2rtc to v1.9.13 (#21648)

Co-authored-by: Eugeny Tulupov <eugeny.tulupov@spirent.com>

* Refactor Time-Lapse Export (#21668)

* refactor time lapse creation to be a separate API call with ability to pass arbitrary ffmpeg args

* Add CPU fallback

* Optimize empty directory cleanup for recordings (#21695)

The previous empty directory cleanup did a full recursive directory
walk, which can be extremely slow. This new implementation only removes
directories which have a chance of being empty due to a recent file
deletion.

* Implement llama.cpp GenAI Provider (#21690)

* Implement llama.cpp GenAI Provider

* Add docs

* Update links

* Fix broken mqtt links

* Fix more broken anchors

* Remove parents in remove_empty_directories (#21726)

The original implementation did a full directory tree walk to find and remove
empty directories, so this implementation should remove the parents as well,
like the original did.

* Implement LLM Chat API with tool calling support (#21731)

* Implement initial tools definiton APIs

* Add initial chat completion API with tool support

* Implement other providers

* Cleanup

* Offline preview image (#21752)

* use latest preview frame for latest image when camera is offline

* remove frame extraction logic

* tests

* frontend

* add description to api endpoint

* Update to ROCm 7.2.0 (#21753)

* Update to ROCm 7.2.0

* ROCm now works properly with JinaV1

* Arcface has compilation error

* Add live context tool to LLM (#21754)

* Add live context tool

* Improve handling of images in request

* Improve prompt caching

* Add networking options for configuring listening ports (#21779)

* feat: add X-Frame-Time when returning snapshot (#21932)

Co-authored-by: Florent MORICONI <170678386+fmcloudconsulting@users.noreply.github.com>

* Improve jsmpeg player websocket handling (#21943)

* improve jsmpeg player websocket handling

prevent websocket console messages from appearing when player is destroyed

* reformat files after ruff upgrade

* Allow API Events to be Detections or Alerts, depending on the Event Label (#21923)

* - API created events will be alerts OR detections, depending on the event label, defaulting to alerts
- Indefinite API events will extend the recording segment until those events are ended
- API event start time is the actual start time, instead of having a pre-buffer of record.event_pre_capture

* Instead of checking for indefinite events on a camera before deciding if we should end the segment, only update last_detection_time and last_alert_time if frame_time is greater, which should have the same effect

* Add the ability to set a pre_capture number of seconds when creating a manual event via the API. Default behavior unchanged

* Remove unnecessary _publish_segment_start() call

* Formatting

* handle last_alert_time or last_detection_time being None when checking them against the frame_time

* comment manual_info["label"].split(": ")[0] for clarity

* ffmpeg Preview Segment Optimization for "high" and "very_high" (#21996)

* Introduce qmax parameter for ffmpeg preview encoding

Added PREVIEW_QMAX_PARAM to control ffmpeg encoding quality.

* formatting

* Fix spacing in qmax parameters for preview quality

* Adapt to new Gemini format

* Fix frame time access

* Remove exceptions

* Cleanup

---------

Co-authored-by: Josh Hawkins <32435876+hawkeye217@users.noreply.github.com>
Co-authored-by: tigattack <10629864+tigattack@users.noreply.github.com>
Co-authored-by: Andrew Roberts <adroberts@gmail.com>
Co-authored-by: Eugeny Tulupov <zhekka3@gmail.com>
Co-authored-by: Eugeny Tulupov <eugeny.tulupov@spirent.com>
Co-authored-by: John Shaw <1753078+johnshaw@users.noreply.github.com>
Co-authored-by: Eric Work <work.eric@gmail.com>
Co-authored-by: FL42 <46161216+fl42@users.noreply.github.com>
Co-authored-by: Florent MORICONI <170678386+fmcloudconsulting@users.noreply.github.com>
Co-authored-by: nulledy <254504350+nulledy@users.noreply.github.com>
2026-02-26 21:16:10 -07:00

374 lines
17 KiB
Python

"""Generative AI module for Frigate."""
import datetime
import importlib
import logging
import os
import re
from typing import Any, Optional
from playhouse.shortcuts import model_to_dict
from frigate.config import CameraConfig, FrigateConfig, GenAIConfig, GenAIProviderEnum
from frigate.const import CLIPS_DIR
from frigate.data_processing.post.types import ReviewMetadata
from frigate.models import Event
logger = logging.getLogger(__name__)
PROVIDERS = {}
def register_genai_provider(key: GenAIProviderEnum):
"""Register a GenAI provider."""
def decorator(cls):
PROVIDERS[key] = cls
return cls
return decorator
class GenAIClient:
"""Generative AI client for Frigate."""
def __init__(self, genai_config: GenAIConfig, timeout: int = 120) -> None:
self.genai_config: GenAIConfig = genai_config
self.timeout = timeout
self.provider = self._init_provider()
def generate_review_description(
self,
review_data: dict[str, Any],
thumbnails: list[bytes],
concerns: list[str],
preferred_language: str | None,
debug_save: bool,
activity_context_prompt: str,
) -> ReviewMetadata | None:
"""Generate a description for the review item activity."""
def get_concern_prompt() -> str:
if concerns:
concern_list = "\n - ".join(concerns)
return f"""- `other_concerns` (list of strings): Include a list of any of the following concerns that are occurring:
- {concern_list}"""
else:
return ""
def get_language_prompt() -> str:
if preferred_language:
return f"Provide your answer in {preferred_language}"
else:
return ""
def get_objects_list() -> str:
if review_data["unified_objects"]:
return "\n- " + "\n- ".join(review_data["unified_objects"])
else:
return "\n- (No objects detected)"
context_prompt = f"""
Your task is to analyze a sequence of images taken in chronological order from a security camera.
## Normal Activity Patterns for This Property
{activity_context_prompt}
## Task Instructions
Your task is to provide a clear, accurate description of the scene that:
1. States exactly what is happening based on observable actions and movements.
2. Evaluates the activity against the Normal and Suspicious Activity Indicators above.
3. Assigns a potential_threat_level (0, 1, or 2) based on the threat level indicators defined above, applying them consistently.
**Use the activity patterns above as guidance to calibrate your assessment. Match the activity against both normal and suspicious indicators, then use your judgment based on the complete context.**
## Analysis Guidelines
When forming your description:
- **CRITICAL: Only describe objects explicitly listed in "Objects in Scene" below.** Do not infer or mention additional people, vehicles, or objects not present in this list, even if visual patterns suggest them. If only a car is listed, do not describe a person interacting with it unless "person" is also in the objects list.
- **Only describe actions actually visible in the frames.** Do not assume or infer actions that you don't observe happening. If someone walks toward furniture but you never see them sit, do not say they sat. Stick to what you can see across the sequence.
- Describe what you observe: actions, movements, interactions with objects and the environment. Include any observable environmental changes (e.g., lighting changes triggered by activity).
- Note visible details such as clothing, items being carried or placed, tools or equipment present, and how they interact with the property or objects.
- Consider the full sequence chronologically: what happens from start to finish, how duration and actions relate to the location and objects involved.
- **Use the actual timestamp provided in "Activity started at"** below for time of day context—do not infer time from image brightness or darkness. Unusual hours (late night/early morning) should increase suspicion when the observable behavior itself appears questionable. However, recognize that some legitimate activities can occur at any hour.
- **Consider duration as a primary factor**: Apply the duration thresholds defined in the activity patterns above. Brief sequences during normal hours with apparent purpose typically indicate normal activity unless explicit suspicious actions are visible.
- **Weigh all evidence holistically**: Match the activity against the normal and suspicious patterns defined above, then evaluate based on the complete context (zone, objects, time, actions, duration). Apply the threat level indicators consistently. Use your judgment for edge cases.
## Response Format
Your response MUST be a flat JSON object with:
- `scene` (string): A narrative description of what happens across the sequence from start to finish, in chronological order. Start by describing how the sequence begins, then describe the progression of events. **Describe all significant movements and actions in the order they occur.** For example, if a vehicle arrives and then a person exits, describe both actions sequentially. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
- `title` (string): A concise, grammatically complete title in the format "[Subject] [action verb] [context]" that matches your scene description. Use names from "Objects in Scene" when you visually observe them.
- `shortSummary` (string): A brief 2-sentence summary of the scene, suitable for notifications. Should capture the key activity and context without full detail. This should be a condensed version of the scene description above.
- `confidence` (float): 0-1 confidence in your analysis. Higher confidence when objects/actions are clearly visible and context is unambiguous. Lower confidence when the sequence is unclear, objects are partially obscured, or context is ambiguous.
- `potential_threat_level` (integer): 0, 1, or 2 as defined in "Normal Activity Patterns for This Property" above. Your threat level must be consistent with your scene description and the guidance above.
{get_concern_prompt()}
## Sequence Details
- Camera: {review_data["camera"]}
- Total frames: {len(thumbnails)} (Frame 1 = earliest, Frame {len(thumbnails)} = latest)
- Activity started at {review_data["start"]} and lasted {review_data["duration"]} seconds
- Zones involved: {", ".join(review_data["zones"]) if review_data["zones"] else "None"}
## Objects in Scene
Each line represents a detection state, not necessarily unique individuals. Parentheses indicate object type or category, use only the name/label in your response, not the parentheses.
**CRITICAL: When you see both recognized and unrecognized entries of the same type (e.g., "Joe (person)" and "Person"), visually count how many distinct people/objects you actually see based on appearance and clothing. If you observe only ONE person throughout the sequence, use ONLY the recognized name (e.g., "Joe"). The same person may be recognized in some frames but not others. Only describe both if you visually see MULTIPLE distinct people with clearly different appearances.**
**Note: Unidentified objects (without names) are NOT indicators of suspicious activity—they simply mean the system hasn't identified that object.**
{get_objects_list()}
## Important Notes
- Values must be plain strings, floats, or integers — no nested objects, no extra commentary.
- Only describe objects from the "Objects in Scene" list above. Do not hallucinate additional objects.
- When describing people or vehicles, use the exact names provided.
{get_language_prompt()}
"""
logger.debug(
f"Sending {len(thumbnails)} images to create review description on {review_data['camera']}"
)
if debug_save:
with open(
os.path.join(
CLIPS_DIR, "genai-requests", review_data["id"], "prompt.txt"
),
"w",
) as f:
f.write(context_prompt)
response = self._send(context_prompt, thumbnails)
if debug_save and response:
with open(
os.path.join(
CLIPS_DIR, "genai-requests", review_data["id"], "response.txt"
),
"w",
) as f:
f.write(response)
if response:
clean_json = re.sub(
r"\n?```$", "", re.sub(r"^```[a-zA-Z0-9]*\n?", "", response)
)
try:
metadata = ReviewMetadata.model_validate_json(clean_json)
# If any verified objects (contain parentheses with name), set to 0
if any("(" in obj for obj in review_data["unified_objects"]):
metadata.potential_threat_level = 0
metadata.time = review_data["start"]
return metadata
except Exception as e:
# rarely LLMs can fail to follow directions on output format
logger.warning(
f"Failed to parse review description as the response did not match expected format. {e}"
)
return None
else:
return None
def generate_review_summary(
self,
start_ts: float,
end_ts: float,
events: list[dict[str, Any]],
preferred_language: str | None,
debug_save: bool,
) -> str | None:
"""Generate a summary of review item descriptions over a period of time."""
time_range = f"{datetime.datetime.fromtimestamp(start_ts).strftime('%B %d, %Y at %I:%M %p')} to {datetime.datetime.fromtimestamp(end_ts).strftime('%B %d, %Y at %I:%M %p')}"
timeline_summary_prompt = f"""
You are a security officer writing a concise security report.
Time range: {time_range}
Input format: Each event is a JSON object with:
- "title", "scene", "confidence", "potential_threat_level" (0-2), "other_concerns", "camera", "time", "start_time", "end_time"
- "context": array of related events from other cameras that occurred during overlapping time periods
**Note: Use the "scene" field for event descriptions in the report. Ignore any "shortSummary" field if present.**
Report Structure - Use this EXACT format:
# Security Summary - {time_range}
## Overview
[Write 1-2 sentences summarizing the overall activity pattern during this period.]
---
## Timeline
[Group events by time periods (e.g., "Morning (6:00 AM - 12:00 PM)", "Afternoon (12:00 PM - 5:00 PM)", "Evening (5:00 PM - 9:00 PM)", "Night (9:00 PM - 6:00 AM)"). Use appropriate time blocks based on when events occurred.]
### [Time Block Name]
**HH:MM AM/PM** | [Camera Name] | [Threat Level Indicator]
- [Event title]: [Clear description incorporating contextual information from the "context" array]
- Context: [If context array has items, mention them here, e.g., "Delivery truck present on Front Driveway Cam (HH:MM AM/PM)"]
- Assessment: [Brief assessment incorporating context - if context explains the event, note it here]
[Repeat for each event in chronological order within the time block]
---
## Summary
[One sentence summarizing the period. If all events are normal/explained: "Routine activity observed." If review needed: "Some activity requires review but no security concerns." If security concerns: "Security concerns requiring immediate attention."]
Guidelines:
- List ALL events in chronological order, grouped by time blocks
- Threat level indicators: ✓ Normal, ⚠️ Needs review, 🔴 Security concern
- Integrate contextual information naturally - use the "context" array to enrich each event's description
- If context explains the event (e.g., delivery truck explains person at door), describe it accordingly (e.g., "delivery person" not "unidentified person")
- Be concise but informative - focus on what happened and what it means
- If contextual information makes an event clearly normal, reflect that in your assessment
- Only create time blocks that have events - don't create empty sections
"""
timeline_summary_prompt += "\n\nEvents:\n"
for event in events:
timeline_summary_prompt += f"\n{event}\n"
if preferred_language:
timeline_summary_prompt += f"\nProvide your answer in {preferred_language}"
if debug_save:
with open(
os.path.join(
CLIPS_DIR, "genai-requests", f"{start_ts}-{end_ts}", "prompt.txt"
),
"w",
) as f:
f.write(timeline_summary_prompt)
response = self._send(timeline_summary_prompt, [])
if debug_save and response:
with open(
os.path.join(
CLIPS_DIR, "genai-requests", f"{start_ts}-{end_ts}", "response.txt"
),
"w",
) as f:
f.write(response)
return response
def generate_object_description(
self,
camera_config: CameraConfig,
thumbnails: list[bytes],
event: Event,
) -> Optional[str]:
"""Generate a description for the frame."""
try:
prompt = camera_config.objects.genai.object_prompts.get(
event.label,
camera_config.objects.genai.prompt,
).format(**model_to_dict(event))
except KeyError as e:
logger.error(f"Invalid key in GenAI prompt: {e}")
return None
logger.debug(f"Sending images to genai provider with prompt: {prompt}")
return self._send(prompt, thumbnails)
def _init_provider(self):
"""Initialize the client."""
return None
def _send(self, prompt: str, images: list[bytes]) -> Optional[str]:
"""Submit a request to the provider."""
return None
def get_context_size(self) -> int:
"""Get the context window size for this provider in tokens."""
return 4096
def chat_with_tools(
self,
messages: list[dict[str, Any]],
tools: Optional[list[dict[str, Any]]] = None,
tool_choice: Optional[str] = "auto",
) -> dict[str, Any]:
"""
Send chat messages to LLM with optional tool definitions.
This method handles conversation-style interactions with the LLM,
including function calling/tool usage capabilities.
Args:
messages: List of message dictionaries. Each message should have:
- 'role': str - One of 'user', 'assistant', 'system', or 'tool'
- 'content': str - The message content
- 'tool_call_id': Optional[str] - For tool responses, the ID of the tool call
- 'name': Optional[str] - For tool messages, the tool name
tools: Optional list of tool definitions in OpenAI-compatible format.
Each tool should have 'type': 'function' and 'function' with:
- 'name': str - Tool name
- 'description': str - Tool description
- 'parameters': dict - JSON schema for parameters
tool_choice: How the model should handle tools:
- 'auto': Model decides whether to call tools
- 'none': Model must not call tools
- 'required': Model must call at least one tool
- Or a dict specifying a specific tool to call
**kwargs: Additional provider-specific parameters.
Returns:
Dictionary with:
- 'content': Optional[str] - The text response from the LLM, None if tool calls
- 'tool_calls': Optional[List[Dict]] - List of tool calls if LLM wants to call tools.
Each tool call dict has:
- 'id': str - Unique identifier for this tool call
- 'name': str - Tool name to call
- 'arguments': dict - Arguments for the tool call (parsed JSON)
- 'finish_reason': str - Reason generation stopped:
- 'stop': Normal completion
- 'tool_calls': LLM wants to call tools
- 'length': Hit token limit
- 'error': An error occurred
Raises:
NotImplementedError: If the provider doesn't implement this method.
"""
# Base implementation - each provider should override this
logger.warning(
f"{self.__class__.__name__} does not support chat_with_tools. "
"This method should be overridden by the provider implementation."
)
return {
"content": None,
"tool_calls": None,
"finish_reason": "error",
}
def get_genai_client(config: FrigateConfig) -> Optional[GenAIClient]:
"""Get the GenAI client."""
if not config.genai.provider:
return None
load_providers()
provider = PROVIDERS.get(config.genai.provider)
if provider:
return provider(config.genai)
return None
def load_providers():
package_dir = os.path.dirname(__file__)
for filename in os.listdir(package_dir):
if filename.endswith(".py") and filename != "__init__.py":
module_name = f"frigate.genai.{filename[:-3]}"
importlib.import_module(module_name)