diff --git a/docs/docs/configuration/genai.md b/docs/docs/configuration/genai.md index e2f6ac318..aace224f3 100644 --- a/docs/docs/configuration/genai.md +++ b/docs/docs/configuration/genai.md @@ -3,7 +3,7 @@ id: genai title: Generative AI --- -Generative AI can be used to automatically generate descriptions based on the thumbnails of your tracked objects. This helps with [Semantic Search](/configuration/semantic_search) in Frigate by providing detailed text descriptions as a basis of the search query. +Generative AI can be used to automatically generate descriptive text based on the thumbnails of your tracked objects. This helps with [Semantic Search](/configuration/semantic_search) in Frigate to provide more context about your tracked objects. Semantic Search must be enabled to use Generative AI. Descriptions are accessed via the _Explore_ view in the Frigate UI by clicking on a tracked object's thumbnail. @@ -122,12 +122,18 @@ genai: api_key: "{FRIGATE_OPENAI_API_KEY}" ``` +## Usage and Best Practices + +Frigate's thumbnail search excels at identifying specific details about tracked objects – for example, using an "image caption" approach to find a "person wearing a yellow vest," "a white dog running across the lawn," or "a red car on a residential street." To enhance this further, Frigate’s default prompts are designed to ask your AI provider about the intent behind the object's actions, rather than just describing its appearance. + +While generating simple descriptions of detected objects is useful, understanding intent provides a deeper layer of insight. Instead of just recognizing "what" is in a scene, Frigate’s default prompts aim to infer "why" it might be there or "what" it could do next. Descriptions tell you what’s happening, but intent gives context. For instance, a person walking toward a door might seem like a visitor, but if they’re moving quickly after hours, you can infer a potential break-in attempt. Detecting a person loitering near a door at night can trigger an alert sooner than simply noting "a person standing by the door," helping you respond based on the situation’s context. + ## Custom Prompts Frigate sends multiple frames from the tracked object along with a prompt to your Generative AI provider asking it to generate a description. The default prompt is as follows: ``` -Describe the {label} in the sequence of images with as much detail as possible. Do not describe the background. +Analyze the sequence of images containing the {label}. Focus on the likely intent or behavior of the {label} based on its actions and movement, rather than describing its appearance or the surroundings. Consider what the {label} is doing, why, and what it might do next. ``` :::tip @@ -144,10 +150,10 @@ genai: provider: ollama base_url: http://localhost:11434 model: llava - prompt: "Describe the {label} in these images from the {camera} security camera." + prompt: "Analyze the {label} in these images from the {camera} security camera. Focus on the actions, behavior, and potential intent of the {label}, rather than just describing its appearance." object_prompts: - person: "Describe the main person in these images (gender, age, clothing, activity, etc). Do not include where the activity is occurring (sidewalk, concrete, driveway, etc)." - car: "Label the primary vehicle in these images with just the name of the company if it is a delivery vehicle, or the color make and model." + person: "Examine the main person in these images. What are they doing and what might their actions suggest about their intent (e.g., approaching a door, leaving an area, standing still)? Do not describe the surroundings or static details." + car: "Observe the primary vehicle in these images. Focus on its movement, direction, or purpose (e.g., parking, approaching, circling). If it's a delivery vehicle, mention the company." ``` Prompts can also be overriden at the camera level to provide a more detailed prompt to the model about your specific camera, if you desire. By default, descriptions will be generated for all tracked objects and all zones. But you can also optionally specify `objects` and `required_zones` to only generate descriptions for certain tracked objects or zones. @@ -159,10 +165,10 @@ cameras: front_door: genai: use_snapshot: True - prompt: "Describe the {label} in these images from the {camera} security camera at the front door of a house, aimed outward toward the street." + prompt: "Analyze the {label} in these images from the {camera} security camera at the front door. Focus on the actions and potential intent of the {label}." object_prompts: - person: "Describe the main person in these images (gender, age, clothing, activity, etc). Do not include where the activity is occurring (sidewalk, concrete, driveway, etc). If delivering a package, include the company the package is from." - cat: "Describe the cat in these images (color, size, tail). Indicate whether or not the cat is by the flower pots. If the cat is chasing a mouse, make up a name for the mouse." + person: "Examine the person in these images. What are they doing, and how might their actions suggest their purpose (e.g., delivering something, approaching, leaving)? If they are carrying or interacting with a package, include details about its source or destination." + cat: "Observe the cat in these images. Focus on its movement and intent (e.g., wandering, hunting, interacting with objects). If the cat is near the flower pots or engaging in any specific actions, mention it." objects: - person - cat diff --git a/docs/docs/configuration/semantic_search.md b/docs/docs/configuration/semantic_search.md index 18093a479..a7b35ed77 100644 --- a/docs/docs/configuration/semantic_search.md +++ b/docs/docs/configuration/semantic_search.md @@ -50,10 +50,11 @@ semantic_search: - Configuring the `large` model employs the full Jina model and will automatically run on the GPU if applicable. - Configuring the `small` model employs a quantized version of the model that uses much less RAM and runs faster on CPU with a very negligible difference in embedding quality. -## Usage +## Usage and Best Practices 1. Semantic search is used in conjunction with the other filters available on the Search page. Use a combination of traditional filtering and semantic search for the best results. -2. Because of how the AI models Frigate uses have been trained, the comparison between text and image embedding distances generally means that results matching `description` will appear first, even if a `thumbnail` embedding may be a better match. Play with the "Search Type" setting to help find what you are looking for. Note that if you are generating descriptions for specific objects or zones only, this may cause search results to prioritize the objects with descriptions even if the the ones without them are more relevant. -3. Make your search language and tone closely match your descriptions. If you are using thumbnail search, **phrase your query as an image caption**. For example "red car" will not work as well as "red sedan driving down a residential street on a sunny day". -4. Semantic search on thumbnails tends to return better results when matching large subjects that take up most of the frame. Small things like "cat" tend to not work well. -5. Experiment! Find a tracked object you want to test and start typing keywords and phrases to see what works for you. +2. Use the thumbnail search type when searching for particular objects in the scene. Use the description search type when attempting to discern the intent of your object. +3. Because of how the AI models Frigate uses have been trained, the comparison between text and image embedding distances generally means that with multi-modal (`thumbnail` and `description`) searches, results matching `description` will appear first, even if a `thumbnail` embedding may be a better match. Play with the "Search Type" setting to help find what you are looking for. Note that if you are generating descriptions for specific objects or zones only, this may cause search results to prioritize the objects with descriptions even if the the ones without them are more relevant. +4. Make your search language and tone closely match exactly what you're looking for. If you are using thumbnail search, **phrase your query as an image caption**. Searching for "red car" may not work as well as "red sedan driving down a residential street on a sunny day". +5. Semantic search on thumbnails tends to return better results when matching large subjects that take up most of the frame. Small things like "cat" tend to not work well. +6. Experiment! Find a tracked object you want to test and start typing keywords and phrases to see what works for you. diff --git a/frigate/config/camera/genai.py b/frigate/config/camera/genai.py index 21c3d4525..35c26eaf8 100644 --- a/frigate/config/camera/genai.py +++ b/frigate/config/camera/genai.py @@ -23,7 +23,7 @@ class GenAICameraConfig(BaseModel): default=False, title="Use snapshots for generating descriptions." ) prompt: str = Field( - default="Describe the {label} in the sequence of images with as much detail as possible. Do not describe the background.", + default="Analyze the sequence of images containing the {label}. Focus on the likely intent or behavior of the {label} based on its actions and movement, rather than describing its appearance or the surroundings. Consider what the {label} is doing, why, and what it might do next.", title="Default caption prompt.", ) object_prompts: dict[str, str] = Field( @@ -51,7 +51,7 @@ class GenAICameraConfig(BaseModel): class GenAIConfig(FrigateBaseModel): enabled: bool = Field(default=False, title="Enable GenAI.") prompt: str = Field( - default="Describe the {label} in the sequence of images with as much detail as possible. Do not describe the background.", + default="Analyze the sequence of images containing the {label}. Focus on the likely intent or behavior of the {label} based on its actions and movement, rather than describing its appearance or the surroundings. Consider what the {label} is doing, why, and what it might do next.", title="Default caption prompt.", ) object_prompts: dict[str, str] = Field(