Miscellaneous Fixes (#20841)

* show id field when editing zone * improve zone capitalization * Update NPU models and docs * fix mobilepage in tracked object details * Use thread lock for openvino to avoid concurrent requests with JinaV2 * fix hashing function to avoid collisions * remove extra flex div causing overflow * ensure header stays on top of video controls * don't smart capitalize friendly names * Fix incorrect object classification crop * don't display submit to plus if object doesn't have a snapshot * check for snapshot and clip in actions menu * frigate plus submission fix still show frigate+ section if snapshot has already been submitted and run optimistic update, local state was being overridden * Don't fail to show 0% when showing classification * Don't fail on file system error * Improve title and description for review genai * fix overflowing truncated review item description in detail stream * catch events with review items that start after the first timeline entry review items may start later than events within them, so subtract a padding from the start time in the filter so the start of events are not incorrectly filtered out of the list in the detail stream * also pad on review end_time * fix * change order of timeline zoom buttons on mobile * use grid to ensure genai title does not cause overflow * small tweaks * Cleanup --------- Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
2026-04-28 23:06:13 +02:00 · 2025-11-08 06:44:30 -06:00
parent ef19332fe5
commit 01452e4c51
15 changed files with 232 additions and 132 deletions
--- a/frigate/data_processing/real_time/custom_classification.py
+++ b/frigate/data_processing/real_time/custom_classification.py
@@ -418,8 +418,8 @@ class CustomObjectClassificationProcessor(RealTimeProcessorApi):
            obj_data["box"][2],
            obj_data["box"][3],
            max(
-                obj_data["box"][1] - obj_data["box"][0],
-                obj_data["box"][3] - obj_data["box"][2],
+                obj_data["box"][2] - obj_data["box"][0],
+                obj_data["box"][3] - obj_data["box"][1],
            ),
            1.0,
        )
@@ -546,5 +546,8 @@ def write_classification_attempt(
    )

    # delete oldest face image if maximum is reached
-    if len(files) > max_files:
-        os.unlink(os.path.join(folder, files[-1]))
+    try:
+        if len(files) > max_files:
+            os.unlink(os.path.join(folder, files[-1]))
+    except FileNotFoundError:
+        pass
--- a/frigate/detectors/detection_runners.py
+++ b/frigate/detectors/detection_runners.py
@@ -3,6 +3,7 @@
 import logging
 import os
 import platform
+import threading
 from abc import ABC, abstractmethod
 from typing import Any

@@ -161,12 +162,12 @@ class CudaGraphRunner(BaseModelRunner):
    """

    @staticmethod
-    def is_complex_model(model_type: str) -> bool:
+    def is_model_supported(model_type: str) -> bool:
        # Import here to avoid circular imports
        from frigate.detectors.detector_config import ModelTypeEnum
        from frigate.embeddings.types import EnrichmentModelTypeEnum

-        return model_type in [
+        return model_type not in [
            ModelTypeEnum.yolonas.value,
            EnrichmentModelTypeEnum.paddleocr.value,
            EnrichmentModelTypeEnum.jina_v1.value,
@@ -239,9 +240,30 @@ class OpenVINOModelRunner(BaseModelRunner):
            EnrichmentModelTypeEnum.jina_v2.value,
        ]

+    @staticmethod
+    def is_model_npu_supported(model_type: str) -> bool:
+        # Import here to avoid circular imports
+        from frigate.embeddings.types import EnrichmentModelTypeEnum
+
+        return model_type not in [
+            EnrichmentModelTypeEnum.paddleocr.value,
+            EnrichmentModelTypeEnum.jina_v1.value,
+            EnrichmentModelTypeEnum.jina_v2.value,
+            EnrichmentModelTypeEnum.arcface.value,
+        ]
+
    def __init__(self, model_path: str, device: str, model_type: str, **kwargs):
        self.model_path = model_path
        self.device = device
+
+        if device == "NPU" and not OpenVINOModelRunner.is_model_npu_supported(
+            model_type
+        ):
+            logger.warning(
+                f"OpenVINO model {model_type} is not supported on NPU, using GPU instead"
+            )
+            device = "GPU"
+
        self.complex_model = OpenVINOModelRunner.is_complex_model(model_type)

        if not os.path.isfile(model_path):
@@ -269,6 +291,10 @@ class OpenVINOModelRunner(BaseModelRunner):
        self.infer_request = self.compiled_model.create_infer_request()
        self.input_tensor: ov.Tensor | None = None

+        # Thread lock to prevent concurrent inference (needed for JinaV2 which shares
+        # one runner between text and vision embeddings called from different threads)
+        self._inference_lock = threading.Lock()
+
        if not self.complex_model:
            try:
                input_shape = self.compiled_model.inputs[0].get_shape()
@@ -312,67 +338,70 @@ class OpenVINOModelRunner(BaseModelRunner):
        Returns:
            List of output tensors
        """
-        # Handle single input case for backward compatibility
-        if (
-            len(inputs) == 1
-            and len(self.compiled_model.inputs) == 1
-            and self.input_tensor is not None
-        ):
-            # Single input case - use the pre-allocated tensor for efficiency
-            input_data = list(inputs.values())[0]
-            np.copyto(self.input_tensor.data, input_data)
-            self.infer_request.infer(self.input_tensor)
-        else:
-            if self.complex_model:
-                try:
-                    # This ensures the model starts with a clean state for each sequence
-                    # Important for RNN models like PaddleOCR recognition
-                    self.infer_request.reset_state()
-                except Exception:
-                    # this will raise an exception for models with AUTO set as the device
-                    pass
+        # Lock prevents concurrent access to infer_request
+        # Needed for JinaV2: genai thread (text) + embeddings thread (vision)
+        with self._inference_lock:
+            # Handle single input case for backward compatibility
+            if (
+                len(inputs) == 1
+                and len(self.compiled_model.inputs) == 1
+                and self.input_tensor is not None
+            ):
+                # Single input case - use the pre-allocated tensor for efficiency
+                input_data = list(inputs.values())[0]
+                np.copyto(self.input_tensor.data, input_data)
+                self.infer_request.infer(self.input_tensor)
+            else:
+                if self.complex_model:
+                    try:
+                        # This ensures the model starts with a clean state for each sequence
+                        # Important for RNN models like PaddleOCR recognition
+                        self.infer_request.reset_state()
+                    except Exception:
+                        # this will raise an exception for models with AUTO set as the device
+                        pass

-            # Multiple inputs case - set each input by name
-            for input_name, input_data in inputs.items():
-                # Find the input by name and its index
-                input_port = None
-                input_index = None
-                for idx, port in enumerate(self.compiled_model.inputs):
-                    if port.get_any_name() == input_name:
-                        input_port = port
-                        input_index = idx
-                        break
+                # Multiple inputs case - set each input by name
+                for input_name, input_data in inputs.items():
+                    # Find the input by name and its index
+                    input_port = None
+                    input_index = None
+                    for idx, port in enumerate(self.compiled_model.inputs):
+                        if port.get_any_name() == input_name:
+                            input_port = port
+                            input_index = idx
+                            break

-                if input_port is None:
-                    raise ValueError(f"Input '{input_name}' not found in model")
+                    if input_port is None:
+                        raise ValueError(f"Input '{input_name}' not found in model")

-                # Create tensor with the correct element type
-                input_element_type = input_port.get_element_type()
+                    # Create tensor with the correct element type
+                    input_element_type = input_port.get_element_type()

-                # Ensure input data matches the expected dtype to prevent type mismatches
-                # that can occur with models like Jina-CLIP v2 running on OpenVINO
-                expected_dtype = input_element_type.to_dtype()
-                if input_data.dtype != expected_dtype:
-                    logger.debug(
-                        f"Converting input '{input_name}' from {input_data.dtype} to {expected_dtype}"
-                    )
-                    input_data = input_data.astype(expected_dtype)
+                    # Ensure input data matches the expected dtype to prevent type mismatches
+                    # that can occur with models like Jina-CLIP v2 running on OpenVINO
+                    expected_dtype = input_element_type.to_dtype()
+                    if input_data.dtype != expected_dtype:
+                        logger.debug(
+                            f"Converting input '{input_name}' from {input_data.dtype} to {expected_dtype}"
+                        )
+                        input_data = input_data.astype(expected_dtype)

-                input_tensor = ov.Tensor(input_element_type, input_data.shape)
-                np.copyto(input_tensor.data, input_data)
+                    input_tensor = ov.Tensor(input_element_type, input_data.shape)
+                    np.copyto(input_tensor.data, input_data)

-                # Set the input tensor for the specific port index
-                self.infer_request.set_input_tensor(input_index, input_tensor)
+                    # Set the input tensor for the specific port index
+                    self.infer_request.set_input_tensor(input_index, input_tensor)

-            # Run inference
-            self.infer_request.infer()
+                # Run inference
+                self.infer_request.infer()

-        # Get all output tensors
-        outputs = []
-        for i in range(len(self.compiled_model.outputs)):
-            outputs.append(self.infer_request.get_output_tensor(i).data)
+            # Get all output tensors
+            outputs = []
+            for i in range(len(self.compiled_model.outputs)):
+                outputs.append(self.infer_request.get_output_tensor(i).data)

-        return outputs
+            return outputs


 class RKNNModelRunner(BaseModelRunner):
@@ -500,7 +529,7 @@ def get_optimized_runner(
            return OpenVINOModelRunner(model_path, device, model_type, **kwargs)

    if (
-        not CudaGraphRunner.is_complex_model(model_type)
+        not CudaGraphRunner.is_model_supported(model_type)
        and providers[0] == "CUDAExecutionProvider"
    ):
        options[0] = {
--- a/frigate/genai/init.py
+++ b/frigate/genai/init.py
@@ -113,8 +113,8 @@ When forming your description:
 ## Response Format

 Your response MUST be a flat JSON object with:
- `title` (string): A concise, direct title that describes the purpose or overall action, not just what you literally see. {"Use spatial context when available to make titles more meaningful." if camera_context_section else ""} Use names from "Objects in Scene" based on what you visually observe. If you see both a name and an unidentified object of the same type but visually observe only one person/object, use ONLY the name. Examples: "Joe walking dog", "Person taking out trash", "Joe accessing vehicle", "Person leaving porch for driveway", "Joe and person on front porch".
- `scene` (string): A narrative description of what happens across the sequence from start to finish. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
+- `title` (string): A concise, direct title that describes the primary action or event in the sequence, not just what you literally see. {"Use spatial context when available to make titles more meaningful." if camera_context_section else ""} When multiple objects/actions are present, prioritize whichever is most prominent or occurs first. Use names from "Objects in Scene" based on what you visually observe. If you see both a name and an unidentified object of the same type but visually observe only one person/object, use ONLY the name. Examples: "Joe walking dog", "Person taking out trash", "Vehicle arriving in driveway", "Joe accessing vehicle", "Person leaving porch for driveway".
+- `scene` (string): A narrative description of what happens across the sequence from start to finish, in chronological order. Start by describing how the sequence begins, then describe the progression of events. **Describe all significant movements and actions in the order they occur.** For example, if a vehicle arrives and then a person exits, describe both actions sequentially. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
 - `confidence` (float): 0-1 confidence in your analysis. Higher confidence when objects/actions are clearly visible and context is unambiguous. Lower confidence when the sequence is unclear, objects are partially obscured, or context is ambiguous.
 - `potential_threat_level` (integer): 0, 1, or 2 as defined in "Normal Activity Patterns for This Property" above. Your threat level must be consistent with your scene description and the guidance above.
 {get_concern_prompt()}