Miscellaneous Fixes (#20841)

* show id field when editing zone

* improve zone capitalization

* Update NPU models and docs

* fix mobilepage in tracked object details

* Use thread lock for openvino to avoid concurrent requests with JinaV2

* fix hashing function to avoid collisions

* remove extra flex div causing overflow

* ensure header stays on top of video controls

* don't smart capitalize friendly names

* Fix incorrect object classification crop

* don't display submit to plus if object doesn't have a snapshot

* check for snapshot and clip in actions menu

* frigate plus submission fix

still show frigate+ section if snapshot has already been submitted and run optimistic update, local state was being overridden

* Don't fail to show 0% when showing classification

* Don't fail on file system error

* Improve title and description for review genai

* fix overflowing truncated review item description in detail stream

* catch events with review items that start after the first timeline entry

review items may start later than events within them, so subtract a padding from the start time in the filter so the start of events are not incorrectly filtered out of the list in the detail stream

* also pad on review end_time

* fix

* change order of timeline zoom buttons on mobile

* use grid to ensure genai title does not cause overflow

* small tweaks

* Cleanup

---------

Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
This commit is contained in:
Josh Hawkins
2025-11-08 06:44:30 -06:00
committed by GitHub
parent ef19332fe5
commit 01452e4c51
15 changed files with 232 additions and 132 deletions

View File

@@ -418,8 +418,8 @@ class CustomObjectClassificationProcessor(RealTimeProcessorApi):
obj_data["box"][2],
obj_data["box"][3],
max(
obj_data["box"][1] - obj_data["box"][0],
obj_data["box"][3] - obj_data["box"][2],
obj_data["box"][2] - obj_data["box"][0],
obj_data["box"][3] - obj_data["box"][1],
),
1.0,
)
@@ -546,5 +546,8 @@ def write_classification_attempt(
)
# delete oldest face image if maximum is reached
if len(files) > max_files:
os.unlink(os.path.join(folder, files[-1]))
try:
if len(files) > max_files:
os.unlink(os.path.join(folder, files[-1]))
except FileNotFoundError:
pass

View File

@@ -3,6 +3,7 @@
import logging
import os
import platform
import threading
from abc import ABC, abstractmethod
from typing import Any
@@ -161,12 +162,12 @@ class CudaGraphRunner(BaseModelRunner):
"""
@staticmethod
def is_complex_model(model_type: str) -> bool:
def is_model_supported(model_type: str) -> bool:
# Import here to avoid circular imports
from frigate.detectors.detector_config import ModelTypeEnum
from frigate.embeddings.types import EnrichmentModelTypeEnum
return model_type in [
return model_type not in [
ModelTypeEnum.yolonas.value,
EnrichmentModelTypeEnum.paddleocr.value,
EnrichmentModelTypeEnum.jina_v1.value,
@@ -239,9 +240,30 @@ class OpenVINOModelRunner(BaseModelRunner):
EnrichmentModelTypeEnum.jina_v2.value,
]
@staticmethod
def is_model_npu_supported(model_type: str) -> bool:
# Import here to avoid circular imports
from frigate.embeddings.types import EnrichmentModelTypeEnum
return model_type not in [
EnrichmentModelTypeEnum.paddleocr.value,
EnrichmentModelTypeEnum.jina_v1.value,
EnrichmentModelTypeEnum.jina_v2.value,
EnrichmentModelTypeEnum.arcface.value,
]
def __init__(self, model_path: str, device: str, model_type: str, **kwargs):
self.model_path = model_path
self.device = device
if device == "NPU" and not OpenVINOModelRunner.is_model_npu_supported(
model_type
):
logger.warning(
f"OpenVINO model {model_type} is not supported on NPU, using GPU instead"
)
device = "GPU"
self.complex_model = OpenVINOModelRunner.is_complex_model(model_type)
if not os.path.isfile(model_path):
@@ -269,6 +291,10 @@ class OpenVINOModelRunner(BaseModelRunner):
self.infer_request = self.compiled_model.create_infer_request()
self.input_tensor: ov.Tensor | None = None
# Thread lock to prevent concurrent inference (needed for JinaV2 which shares
# one runner between text and vision embeddings called from different threads)
self._inference_lock = threading.Lock()
if not self.complex_model:
try:
input_shape = self.compiled_model.inputs[0].get_shape()
@@ -312,67 +338,70 @@ class OpenVINOModelRunner(BaseModelRunner):
Returns:
List of output tensors
"""
# Handle single input case for backward compatibility
if (
len(inputs) == 1
and len(self.compiled_model.inputs) == 1
and self.input_tensor is not None
):
# Single input case - use the pre-allocated tensor for efficiency
input_data = list(inputs.values())[0]
np.copyto(self.input_tensor.data, input_data)
self.infer_request.infer(self.input_tensor)
else:
if self.complex_model:
try:
# This ensures the model starts with a clean state for each sequence
# Important for RNN models like PaddleOCR recognition
self.infer_request.reset_state()
except Exception:
# this will raise an exception for models with AUTO set as the device
pass
# Lock prevents concurrent access to infer_request
# Needed for JinaV2: genai thread (text) + embeddings thread (vision)
with self._inference_lock:
# Handle single input case for backward compatibility
if (
len(inputs) == 1
and len(self.compiled_model.inputs) == 1
and self.input_tensor is not None
):
# Single input case - use the pre-allocated tensor for efficiency
input_data = list(inputs.values())[0]
np.copyto(self.input_tensor.data, input_data)
self.infer_request.infer(self.input_tensor)
else:
if self.complex_model:
try:
# This ensures the model starts with a clean state for each sequence
# Important for RNN models like PaddleOCR recognition
self.infer_request.reset_state()
except Exception:
# this will raise an exception for models with AUTO set as the device
pass
# Multiple inputs case - set each input by name
for input_name, input_data in inputs.items():
# Find the input by name and its index
input_port = None
input_index = None
for idx, port in enumerate(self.compiled_model.inputs):
if port.get_any_name() == input_name:
input_port = port
input_index = idx
break
# Multiple inputs case - set each input by name
for input_name, input_data in inputs.items():
# Find the input by name and its index
input_port = None
input_index = None
for idx, port in enumerate(self.compiled_model.inputs):
if port.get_any_name() == input_name:
input_port = port
input_index = idx
break
if input_port is None:
raise ValueError(f"Input '{input_name}' not found in model")
if input_port is None:
raise ValueError(f"Input '{input_name}' not found in model")
# Create tensor with the correct element type
input_element_type = input_port.get_element_type()
# Create tensor with the correct element type
input_element_type = input_port.get_element_type()
# Ensure input data matches the expected dtype to prevent type mismatches
# that can occur with models like Jina-CLIP v2 running on OpenVINO
expected_dtype = input_element_type.to_dtype()
if input_data.dtype != expected_dtype:
logger.debug(
f"Converting input '{input_name}' from {input_data.dtype} to {expected_dtype}"
)
input_data = input_data.astype(expected_dtype)
# Ensure input data matches the expected dtype to prevent type mismatches
# that can occur with models like Jina-CLIP v2 running on OpenVINO
expected_dtype = input_element_type.to_dtype()
if input_data.dtype != expected_dtype:
logger.debug(
f"Converting input '{input_name}' from {input_data.dtype} to {expected_dtype}"
)
input_data = input_data.astype(expected_dtype)
input_tensor = ov.Tensor(input_element_type, input_data.shape)
np.copyto(input_tensor.data, input_data)
input_tensor = ov.Tensor(input_element_type, input_data.shape)
np.copyto(input_tensor.data, input_data)
# Set the input tensor for the specific port index
self.infer_request.set_input_tensor(input_index, input_tensor)
# Set the input tensor for the specific port index
self.infer_request.set_input_tensor(input_index, input_tensor)
# Run inference
self.infer_request.infer()
# Run inference
self.infer_request.infer()
# Get all output tensors
outputs = []
for i in range(len(self.compiled_model.outputs)):
outputs.append(self.infer_request.get_output_tensor(i).data)
# Get all output tensors
outputs = []
for i in range(len(self.compiled_model.outputs)):
outputs.append(self.infer_request.get_output_tensor(i).data)
return outputs
return outputs
class RKNNModelRunner(BaseModelRunner):
@@ -500,7 +529,7 @@ def get_optimized_runner(
return OpenVINOModelRunner(model_path, device, model_type, **kwargs)
if (
not CudaGraphRunner.is_complex_model(model_type)
not CudaGraphRunner.is_model_supported(model_type)
and providers[0] == "CUDAExecutionProvider"
):
options[0] = {

View File

@@ -113,8 +113,8 @@ When forming your description:
## Response Format
Your response MUST be a flat JSON object with:
- `title` (string): A concise, direct title that describes the purpose or overall action, not just what you literally see. {"Use spatial context when available to make titles more meaningful." if camera_context_section else ""} Use names from "Objects in Scene" based on what you visually observe. If you see both a name and an unidentified object of the same type but visually observe only one person/object, use ONLY the name. Examples: "Joe walking dog", "Person taking out trash", "Joe accessing vehicle", "Person leaving porch for driveway", "Joe and person on front porch".
- `scene` (string): A narrative description of what happens across the sequence from start to finish. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
- `title` (string): A concise, direct title that describes the primary action or event in the sequence, not just what you literally see. {"Use spatial context when available to make titles more meaningful." if camera_context_section else ""} When multiple objects/actions are present, prioritize whichever is most prominent or occurs first. Use names from "Objects in Scene" based on what you visually observe. If you see both a name and an unidentified object of the same type but visually observe only one person/object, use ONLY the name. Examples: "Joe walking dog", "Person taking out trash", "Vehicle arriving in driveway", "Joe accessing vehicle", "Person leaving porch for driveway".
- `scene` (string): A narrative description of what happens across the sequence from start to finish, in chronological order. Start by describing how the sequence begins, then describe the progression of events. **Describe all significant movements and actions in the order they occur.** For example, if a vehicle arrives and then a person exits, describe both actions sequentially. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
- `confidence` (float): 0-1 confidence in your analysis. Higher confidence when objects/actions are clearly visible and context is unambiguous. Lower confidence when the sequence is unclear, objects are partially obscured, or context is ambiguous.
- `potential_threat_level` (integer): 0, 1, or 2 as defined in "Normal Activity Patterns for This Property" above. Your threat level must be consistent with your scene description and the guidance above.
{get_concern_prompt()}