Added optimized version of degirum plugin + updated docs

2025-08-31 13:48:19 +02:00 · 2025-06-23 18:32:22 -07:00 · 2025-06-23 18:32:22 -07:00 · 274cc40922
commit 274cc40922
parent 4412671e76
2 changed files with 118 additions and 179 deletions
--- a/docs/docs/configuration/object_detectors.md
+++ b/docs/docs/configuration/object_detectors.md
@ -237,30 +237,9 @@ Hailo8 supports all models in the Hailo Model Zoo that include HailoRT post-proc


 ## DeGirum
-DeGirum is a detector that can use any type of hardware listed on [their website](https://hub.degirum.com). You can connect directly to DeGirum's cloud platform to run inference with just an internet connection after signing up, or use DeGirum with local hardware through a [AI server](#ai-server). You can view their official docs site page for their cloud platform [here](https://docs.degirum.com/ai-hub/quickstart).
+DeGirum is a detector that can use any type of hardware listed on [their website](https://hub.degirum.com). DeGirum can be used with local hardware through an [AI server](#ai-server), or through the use of @local. You can also connect directly to DeGirum's cloud platform to run inferences.

 ### Configuration
-#### AI Hub Cloud Inference
-DeGirum is designed to support very easy cloud inference. To set it up, you need to:
-1. Sign up at [DeGirum's AI Hub](https://hub.degirum.com).
-2. Get an access token.
-3. Create a DeGirum detector in your config.yml file.
-```yaml
-degirum_detector:
-    type: degirum
-    location: "@cloud" # For accessing AI Hub devices and models
-    zoo: degirum/public # DeGirum's public model zoo. Zoo name should be in format "team_name/zoo_name". DeGirum/public is available to everyone, so feel free to use it if you don't know where to start.
-    token: dg_example_token # For authentication with the AI Hub. Get this token through the "tokens" section on the main page of the (AI Hub)[https://hub.degirum.com).
-
-```
-Once `degirum_detector` is setup, you can choose a model through 'model' section in the config.yml file.
-```yaml
-model:
-    path: mobilenet_v2_ssd_coco--300x300_quant_n2x_orca1_1
-    width: 300 # width is in the model name as the first number in the "int"x"int" section
-    height: 300 # height is in the model name as the second number in the "int"x"int" section
-```
-
 #### AI Server Inference
 Before starting with the config file for this section, you must first launch an AI server. DeGirum has an AI server ready to use as a docker container. Add this to your docker-compose.yml to get started:
 ```yaml
@ -272,7 +251,7 @@ degirum_detector:
      - "8778:8778"
 ```
 All supported hardware will automatically be found on your AI server host as long as relevant runtimes and drivers are properly installed on your machine. Refer to [DeGirum's docs site](https://docs.degirum.com/pysdk/runtimes-and-drivers) if you have any trouble.
-Once completed, changing the config.yml file is much the same as the process for cloud.
+Once completed, changing the config.yml file is simple.
 ```yaml
 degirum_detector:
    type: degirum
@ -291,9 +270,55 @@ model:
    path: ./mobilenet_v2_ssd_coco--300x300_quant_n2x_orca1_1 # directory to model .json and file
    width: 300 # width is in the model name as the first number in the "int"x"int" section
    height: 300 # height is in the model name as the second number in the "int"x"int" section
+    input_pixel_format: rgb/bgr # look at the model.json to figure out which to put here
 ```


+#### Local Inference
+It is also possible to eliminate the need for an AI server and run the hardware directly. The benefit of this approach is that you eliminate any bottlenecks that occur when transferring prediction results from the AI server docker container to the frigate one. However, the method of implementing local inference is different for every device and hardware combination, so it's usually more trouble than it's worth. A general guideline to achieve this would be:
+1. Ensuring that the frigate docker container has the runtime you want to use. So for instance, running @local for hailo means making sure the container you're using has the hailo runtime installed.
+2. To double check the runtime is detected by degirum, make sure the `degirum sys-info` command properly shows whatever runtimes you mean to install
+3. Create a DeGirum detector in your config.yml file.
+```yaml
+degirum_detector:
+    type: degirum
+    location: "@local" # For accessing AI Hub devices and models
+    zoo: degirum/public # DeGirum's public model zoo. Zoo name should be in format "team_name/zoo_name". DeGirum/public is available to everyone, so feel free to use it if you don't know where to start.
+    token: dg_example_token # For authentication with the AI Hub. Get this token through the "tokens" section on the main page of the (AI Hub)[https://hub.degirum.com).
+
+```
+Once `degirum_detector` is setup, you can choose a model through 'model' section in the config.yml file.
+```yaml
+model:
+    path: mobilenet_v2_ssd_coco--300x300_quant_n2x_orca1_1
+    width: 300 # width is in the model name as the first number in the "int"x"int" section
+    height: 300 # height is in the model name as the second number in the "int"x"int" section
+    input_pixel_format: rgb/bgr # look at the model.json to figure out which to put here
+```
+
+
+#### AI Hub Cloud Inference
+If you do not possess whatever hardware you want to run, there's also the option to run cloud inferences. Do note that your detection fps might need to be lowered as network latency does significantly slow down this method of detection. For use with Frigate, we highly recommend using a local AI server as described above. To set up cloud inferences,
+1. Sign up at [DeGirum's AI Hub](https://hub.degirum.com).
+2. Get an access token.
+3. Create a DeGirum detector in your config.yml file.
+```yaml
+degirum_detector:
+    type: degirum
+    location: "@cloud" # For accessing AI Hub devices and models
+    zoo: degirum/public # DeGirum's public model zoo. Zoo name should be in format "team_name/zoo_name". DeGirum/public is available to everyone, so feel free to use it if you don't know where to start.
+    token: dg_example_token # For authentication with the AI Hub. Get this token through the "tokens" section on the main page of the (AI Hub)[https://hub.degirum.com).
+
+```
+Once `degirum_detector` is setup, you can choose a model through 'model' section in the config.yml file.
+```yaml
+model:
+    path: mobilenet_v2_ssd_coco--300x300_quant_n2x_orca1_1
+    width: 300 # width is in the model name as the first number in the "int"x"int" section
+    height: 300 # height is in the model name as the second number in the "int"x"int" section
+    input_pixel_format: rgb/bgr # look at the model.json to figure out which to put here
+```
+

 ## OpenVINO Detector

--- a/frigate/detectors/plugins/degirum.py
+++ b/frigate/detectors/plugins/degirum.py
@ -13,63 +13,6 @@ logger = logging.getLogger(__name__)
 DETECTOR_KEY = "degirum"


-### STREAM CLASS FROM DG TOOLS ###
-class Stream(queue.Queue):
-    """Queue-based iterable class with optional item drop"""
-
-    # minimum queue size to avoid deadlocks:
-    # one for stray result, one for poison pill in request_stop(),
-    # and one for poison pill gizmo_run()
-    min_queue_size = 1
-
-    def __init__(self, maxsize=0, allow_drop: bool = False):
-        """Constructor
-
-        - maxsize: maximum stream depth; 0 for unlimited depth
-        - allow_drop: allow dropping elements on put() when stream is full
-        """
-
-        if maxsize < self.min_queue_size and maxsize != 0:
-            raise Exception(
-                f"Incorrect stream depth: {maxsize}. Should be 0 (unlimited) or at least {self.min_queue_size}"
-            )
-
-        super().__init__(maxsize)
-        self.allow_drop = allow_drop
-        self.dropped_cnt = 0  # number of dropped items
-
-    _poison = None
-
-    def put(self, item, block: bool = True, timeout=None) -> None:
-        """Put an item into the stream
-
-        - item: item to put
-        If there is no space left, and allow_drop flag is set, then oldest item will
-        be popped to free space
-        """
-        if self.allow_drop:
-            while True:
-                try:
-                    super().put(item, False)
-                    break
-                except queue.Full:
-                    self.dropped_cnt += 1
-                    try:
-                        self.get_nowait()
-                    finally:
-                        pass
-        else:
-            super().put(item, block, timeout)
-
-    def __iter__(self):
-        """Iterator method"""
-        return iter(self.get, self._poison)
-
-    def close(self):
-        """Close stream: put poison pill"""
-        self.put(self._poison)
-
-
 ### DETECTOR CONFIG ###
 class DGDetectorConfig(BaseDetectorConfig):
    type: Literal[DETECTOR_KEY]
@ -83,139 +26,110 @@ class DGDetector(DetectionApi):
    type_key = DETECTOR_KEY

    def __init__(self, detector_config: DGDetectorConfig):
-        self._queue = Stream(5, allow_drop=True)
+        self._queue = queue.Queue()
        self._zoo = dg.connect(
            detector_config.location, detector_config.zoo, detector_config.token
        )
-        logger.info(f"Models in zoo: {self._zoo.list_models()}")
+
+        logger.debug(f"Models in zoo: {self._zoo.list_models()}")
+
        self.dg_model = self._zoo.load_model(
            detector_config.model.path,
        )
-        self.dg_model.measure_time = True
+
+        # Setting input image format to raw reduces preprocessing time
        self.dg_model.input_image_format = "RAW"
-        self.dg_model._postprocessor = None
-        # Openvino tends to have multidevice, and they default to CPU rather than GPU or NPU
-        types = self.dg_model.supported_device_types
-        for type in types:
-            # If openvino is supported, prioritize using gpu, then npu, then cpu
-            if "OPENVINO" in type:
-                self.dg_model.device_type = [
-                    # "OPENVINO/GPU",
-                    # "OPENVINO/NPU",
-                    "OPENVINO/CPU",
-                ]
-            elif "HAILORT" in type:
-                self.dg_model.device_type = [
-                    "HAILORT/HAILO8l",
-                    "HAILORT/HAILO8",
-                ]
-            break
+
+        # Prioritize the most powerful hardware available
+        self.select_best_device_type()
+        # Frigate handles pre processing as long as these are all set
        input_shape = self.dg_model.input_shape[0]
        self.model_height = input_shape[1]
        self.model_width = input_shape[2]

+        # Passing in dummy frame so initial connection latency happens in
+        # init function and not during actual prediction
        frame = np.zeros(
            (detector_config.model.width, detector_config.model.height, 3),
            dtype=np.uint8,
        )
+        # Pass in frame to overcome first frame latency
        self.dg_model(frame)
        self.prediction = self.prediction_generator()
-        self.none_counter = 0
-        self.not_none_counter = 0
-        self.overall_frame_counter = 0
-        self.times = 0
+
+    def select_best_device_type(self):
+        """
+        Helper function that selects fastest hardware available per model runtime
+        """
+        types = self.dg_model.supported_device_types
+
+        device_map = {
+            "OPENVINO": ["GPU", "NPU", "CPU"],
+            "HAILORT": ["HAILO8L", "HAILO8"],
+            "N2X": ["ORCA1", "CPU"],
+            "ONNX": ["VITIS_NPU", "CPU"],
+            "RKNN": ["RK3566", "RK3568", "RK3588"],
+            "TENSORRT": ["DLA", "GPU", "DLA_ONLY"],
+            "TFLITE": ["ARMNN", "EDGETPU", "CPU"],
+        }
+
+        runtime = types[0].split("/")[0]
+        # Just create an array of format {runtime}/{hardware} for every hardware
+        # in the value for appropriate key in device_map
+        self.dg_model.device_type = [
+            f"{runtime}/{hardware}" for hardware in device_map[runtime]
+        ]

    def prediction_generator(self):
-        # logger.debug("Prediction generator was called")
+        """
+        Generator for all incoming frames. By using this generator, we don't have to keep
+        reconnecting our websocket on every "predict" call.
+        """
+        logger.debug("Prediction generator was called")
        with self.dg_model as model:
            while 1:
-                # logger.debug(f"q size before calling get: {self._queue.qsize()}")
-                data = self._queue.get()
-                # logger.debug(f"q size after calling get: {self._queue.qsize()}")
-                # logger.debug(
-                #     f"Data we're passing into model predict: {data}, shape of data: {data.shape}"
-                # )
-                start = time.time_ns()
+                logger.debug(f"q size before calling get: {self._queue.qsize()}")
+                data = self._queue.get(block=True)
+                logger.debug(f"q size after calling get: {self._queue.qsize()}")
+                logger.debug(
+                    f"Data we're passing into model predict: {data}, shape of data: {data.shape}"
+                )
                result = model.predict(data)
-                self.times += (time.time_ns() - start) * 1e-6
-                # logger.info(
-                #     f"Entire time taken to get result back: {self.times / self.overall_frame_counter}"
-                # )
+                logger.debug(f"Prediction result: {result}")
                yield result

    def detect_raw(self, tensor_input):
-        # start = time.time_ns()
-        self.overall_frame_counter += 1
+        # Reshaping tensor to work with pysdk
        truncated_input = tensor_input.reshape(tensor_input.shape[1:])
-        # logger.debug(f"Detect raw was called for tensor input: {tensor_input}")
+        logger.debug(f"Detect raw was called for tensor input: {tensor_input}")

        # add tensor_input to input queue
        self._queue.put(truncated_input)
-        # logger.debug(f"Queue size after adding truncated input: {self._queue.qsize()}")
+        logger.debug(f"Queue size after adding truncated input: {self._queue.qsize()}")

        # define empty detection result
        detections = np.zeros((20, 6), np.float32)
-        # res = next(self.prediction)
-        result = next(self.prediction)
-        # return detections
-        # result = self.prediction_generator()
-        # logger.info(f"Result: {result}")
-        # logger.info(f"Shape of res: {res.results[0]["data"]}")
-        # logger.debug(f"Queue size after calling for res: {self._queue.qsize()}")
-        # logger.debug(f"Output of res in initial next call: {res}")
-        # logger.info(
-        # f"Overall frame number: {self.overall_frame_counter}, none count: {self.none_counter}, not none count: {self.not_none_counter}, none percentage: {self.none_counter / self.overall_frame_counter}"
-        # )
-        # logger.info(f"Time stats right after res: {self.dg_model.time_stats()}")
-        # start = time.time_ns()
+        # grab prediction
+        res = next(self.prediction)

-        # res_string = str(res)
-        # logger.info(f"Res is: {res_string}")
-        # logger.debug(f"Res's list of attributes: {dir(res)}")
-        # logger.debug(
-        #     f"Res results, {res.results}, length of results: {len(res.results)}"
-        # )
-        # logger.info(f"Output of res: {res}")
-        # res_string = str(res)
-        # logger.info(f"Data from array: {res.results}")
-        # logger.info(f"First data: {res.results[0]['data']}")
-        # logger.info(f"Length of data: {len(res.results[0]['data'][0])}")
-        # if res is not None and res.results[0].get("category_id") is not None:
-        if result is not None:
-            # populate detection result with corresponding inference result information
-            # self.not_none_counter += 1
-            i = 0
+        # If we have an empty prediction, return immediately
+        if len(res.results[0]) == 0:
+            return detections

-            # for result in res.results:
-            #     if i > 20:
-            #         break
+        i = 0
+        for result in res.results:
+            if i >= 20:
+                break

-            #     detections[i] = [
-            #         result["category_id"],
-            #         float(result["score"]),
-            #         result["bbox"][1] / self.model_height,
-            #         result["bbox"][0] / self.model_width,
-            #         result["bbox"][3] / self.model_height,
-            #         result["bbox"][2] / self.model_width,
-            #     ]
-            #     i += 1
+            detections[i] = [
+                result["category_id"],
+                float(result["score"]),
+                result["bbox"][1] / self.model_height,
+                result["bbox"][0] / self.model_width,
+                result["bbox"][3] / self.model_height,
+                result["bbox"][2] / self.model_width,
+            ]
+            i += 1

-            for item in result.results:
-                # logger.info(f"CURRENT ITEM: {item}")
-                if i >= 20:
-                    break
-
-                category_id = int(item[5])
-                score = item[4]
-                y_min = item[1]
-                x_min = item[0]
-                x_max = item[2]
-                y_max = item[3]
-                detections[i] = [category_id, score, y_min, x_min, y_max, x_max]
-                i += 1
-
-        if detections[0][1] != 0:  # if we have a score, then print detection
-            logger.info(f"Output of detections: {detections}")
-        ## Save the detection results to a file so we can compare
-        # logger.info(f"Overall time took: {(time.time_ns() - start) * 1e-6}ms")
+        logger.debug(f"Detections output: {detections}")
        return detections