AMD GPU support with the rocm detector and YOLOv8 pretrained model download (#9762)

* ROCm AMD/GPU based build and detector, WIP * detectors/rocm: separate yolov8 postprocessing into own function; fix box scaling; use cv2.dnn.blobForImage for preprocessing; assert on required model parameters * AMD/ROCm: add couple of more ultralytics models; comments * docker/rocm: make imported model files readable by all * docker/rocm: readme about running on AMD GPUs * docker/rocm: updated README * docker/rocm: updated README * docker/rocm: updated README * detectors/rocm: separated preprocessing functions into yolo_utils.py * detector/plugins: added onnx cpu plugin * docker/rocm: updated container with limite label sets * example detectors view * docker/rocm: updated README.md * docker/rocm: update README.md * docker/rocm: do not set HSA_OVERRIDE_GFX_VERSION at all for the general version as the empty value broke rocm * detectors: simplified/optimized yolov8_postprocess * detector/yolo_utils: indentation, remove unused variable * detectors/rocm: default option to conserve cpu usage at the expense of latency * detectors/yolo_utils: use nms to prefilter overlapping boxes if too many detected * detectors/edgetpu_tfl: add support for yolov8 * util/download_models: script to download yolov8 model files * docker/main: add download-models overlay into s6 startup * detectors/rocm: assume models are in /config/model_cache/yolov8/ * docker/rocm: compile onnx files into mxr files at startup * switch model download into bash script * detectors/rocm: automatically override HSA_OVERRIDE_GFX_VERSION for couple of known chipsets * docs: rocm detector first notes * typos * describe builds (harakas temporary) * docker/rocm: also build a version for gfx1100 * docker/rocm: use cp instead of tar * docker.rocm: remove README as it is now in detector config * frigate/detectors: renamed yolov8_preprocess->preprocess, pass input tensor element type * docker/main: use newer openvino (2023.3.0) * detectors: implement class aggregation * update yolov8 model * add openvino/yolov8 support for label aggregation * docker: remove pointless s6/timeout-up files * Revert "detectors: implement class aggregation" This reverts commit dcfe6bbf6f. * detectors/openvino: remove class aggregation * detectors: increase yolov8 postprocessing score trershold to 0.5 * docker/rocm: separate rocm distributed files into its own build stage * Update object_detectors.md * updated CODEOWNERS file for rocm * updated build names for documentation * Revert "docker/main: use newer openvino (2023.3.0)" This reverts commit dee95de908. * reverrted openvino detector * reverted edgetpu detector * scratched rocm docs from any mention of edgetpu or openvino * Update docs/docs/configuration/object_detectors.md Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com> * renamed frigate.detectors.yolo_utils.py -> frigate.detectors.util.py * clarified rocm example performance * Improved wording and clarified text * Mentioned rocm detector for AMD GPUs * applied ruff formating * applied ruff suggested fixes * docker/rocm: fix missing argument resulting in larger docker image sizes * docs/configuration/object_detectors: fix links to yolov8 release files --------- Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
2025-07-30 13:48:07 +02:00 · 2024-02-10 14:41:46 +02:00 · 2024-02-10 14:41:46 +02:00 · 44d8cdbba1
commit 44d8cdbba1
parent 64988c9be0
26 changed files with 1291 additions and 1 deletions
--- a/2
+++ b/2
@ -2,5 +2,5 @@
 /docker/tensorrt/ @madsciencetist @NateMeyer
 /docker/tensorrt/*arm64* @madsciencetist
 /docker/tensorrt/*jetson* @madsciencetist
-
 /docker/rockchip/ @MarcA711
+/docker/rocm/ @harakas
--- a/docker/main/Dockerfile
+++ b/docker/main/Dockerfile
@ -196,6 +196,8 @@ EXPOSE 8555/tcp 8555/udp

 # Configure logging to prepend timestamps, log to stdout, keep 0 archives and rotate on 10MB
 ENV S6_LOGGING_SCRIPT="T 1 n0 s10000000 T"
+# Do not fail on long-running download scripts
+ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0

 ENTRYPOINT ["/init"]
 CMD []
--- a/docker/main/requirements-wheels.txt
+++ b/docker/main/requirements-wheels.txt
@ -25,6 +25,7 @@ norfair == 2.2.*
 setproctitle == 1.3.*
 ws4py == 0.5.*
 unidecode == 1.3.*
+onnxruntime == 1.16.*
 # Openvino Library - Custom built with MYRIAD support
 openvino @ https://github.com/NateMeyer/openvino-wheels/releases/download/multi-arch_2022.3.1/openvino-2022.3.1-1-cp39-cp39-manylinux_2_31_x86_64.whl; platform_machine == 'x86_64'
 openvino @ https://github.com/NateMeyer/openvino-wheels/releases/download/multi-arch_2022.3.1/openvino-2022.3.1-1-cp39-cp39-linux_aarch64.whl; platform_machine == 'aarch64'
--- a/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/dependencies.d/base
+++ b/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/dependencies.d/base
--- a/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/run
+++ b/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/run
@ -0,0 +1,34 @@
+#!/command/with-contenv bash
+# shellcheck shell=bash
+# Download yolov8 models when DOWNLOAD_YOLOV8=1 environment variable is set
+
+set -o errexit -o nounset -o pipefail
+
+MODEL_CACHE_DIR=${MODEL_CACHE_DIR:-"/config/model_cache"}
+YOLOV8_DIR="$MODEL_CACHE_DIR/yolov8"
+YOLOV8_URL=https://github.com/harakas/models/releases/download/yolov8.1-1.1/yolov8.small.models.tar.gz
+YOLOV8_DIGEST=304186b299560fbacc28eac9b9ea02cc2289fe30eb2c0df30109a2529423695c
+
+if [ "$DOWNLOAD_YOLOV8" = "1" ]; then
+  echo "download-models: DOWNLOAD_YOLOV8=${DOWNLOAD_YOLOV8}, running download"
+  if ! test -f "${YOLOV8_DIR}/model.fetched"; then
+    mkdir -p $YOLOV8_DIR
+    TMP_FILE="${YOLOV8_DIR}/download.tar.gz"
+    curl --no-progress-meter -L --max-filesize 500M --insecure --output $TMP_FILE "${YOLOV8_URL}"
+    digest=$(sha256sum $TMP_FILE | awk '{print $1}')
+    if [ "$digest" = "$YOLOV8_DIGEST" ]; then
+      echo "download-models: Extracting downloaded file"
+      cd $YOLOV8_DIR
+      tar zxf $TMP_FILE
+      rm $TMP_FILE
+      touch model.fetched
+      echo "download-models: Yolov8 download done, files placed into ${YOLOV8_DIR}"
+    else
+      echo "download-models: Downloaded file digest does not match: got $digest, expected $YOLOV8_DIGEST"
+      rm $TMP_FILE
+    fi
+  else
+    echo "download-models: ${YOLOV8_DIR}/model.fetched already present"
+  fi
+fi
+
--- a/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/type
+++ b/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/type
@ -0,0 +1 @@
+oneshot
--- a/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/up
+++ b/docker/main/rootfs/etc/s6-overlay/s6-rc.d/download-models/up
@ -0,0 +1 @@
+/etc/s6-overlay/s6-rc.d/download-models/run
--- a/docker/main/rootfs/etc/s6-overlay/s6-rc.d/frigate/dependencies.d/download-models
+++ b/docker/main/rootfs/etc/s6-overlay/s6-rc.d/frigate/dependencies.d/download-models
--- a/docker/rocm/Dockerfile
+++ b/docker/rocm/Dockerfile
@ -0,0 +1,106 @@
+# syntax=docker/dockerfile:1.4
+
+# https://askubuntu.com/questions/972516/debian-frontend-environment-variable
+ARG DEBIAN_FRONTEND=noninteractive
+ARG ROCM=5.7.3
+ARG AMDGPU=gfx900
+ARG HSA_OVERRIDE_GFX_VERSION
+ARG HSA_OVERRIDE
+
+#######################################################################
+FROM ubuntu:focal as rocm
+
+ARG ROCM
+
+RUN apt-get update && apt-get -y upgrade
+RUN apt-get -y install gnupg wget
+
+RUN mkdir --parents --mode=0755 /etc/apt/keyrings
+
+RUN wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
+COPY docker/rocm/rocm.list /etc/apt/sources.list.d/
+COPY docker/rocm/rocm-pin-600 /etc/apt/preferences.d/
+
+RUN apt-get update
+
+RUN apt-get -y install --no-install-recommends migraphx
+RUN apt-get -y install --no-install-recommends migraphx-dev
+
+RUN mkdir -p /opt/rocm-dist/opt/rocm-$ROCM/lib
+RUN cd /opt/rocm-$ROCM/lib && cp -dpr libMIOpen*.so* libamd*.so* libhip*.so* libhsa*.so* libmigraphx*.so* librocm*.so* librocblas*.so* /opt/rocm-dist/opt/rocm-$ROCM/lib/
+RUN cd /opt/rocm-dist/opt/ && ln -s rocm-$ROCM rocm
+
+RUN mkdir -p /opt/rocm-dist/etc/ld.so.conf.d/
+RUN echo /opt/rocm/lib|tee /opt/rocm-dist/etc/ld.so.conf.d/rocm.conf
+
+#######################################################################
+FROM --platform=linux/amd64 debian:11 as debian-base
+
+RUN apt-get update && apt-get -y upgrade
+RUN apt-get -y install --no-install-recommends libelf1 libdrm2 libdrm-amdgpu1 libnuma1 kmod
+
+RUN apt-get -y install python3
+
+#######################################################################
+# ROCm does not come with migraphx wrappers for python 3.9, so we build it here
+FROM debian-base as debian-build
+
+ARG ROCM
+
+COPY --from=rocm /opt/rocm-$ROCM /opt/rocm-$ROCM
+RUN ln -s /opt/rocm-$ROCM /opt/rocm
+
+RUN apt-get -y install g++ cmake
+RUN apt-get -y install python3-pybind11 python3.9-distutils python3-dev
+
+WORKDIR /opt/build
+
+COPY docker/rocm/migraphx .
+
+RUN mkdir build && cd build && cmake .. && make install
+
+#######################################################################
+FROM deps AS deps-prelim
+
+# need this to install libnuma1
+RUN apt-get update
+# no ugprade?!?!
+RUN apt-get -y install libnuma1
+
+WORKDIR /opt/frigate/
+COPY --from=rootfs / /
+COPY docker/rocm/rootfs/ /
+
+#######################################################################
+FROM scratch AS rocm-dist
+
+ARG ROCM
+ARG AMDGPU
+
+COPY --from=rocm /opt/rocm-$ROCM/bin/rocminfo /opt/rocm-$ROCM/bin/migraphx-driver /opt/rocm-$ROCM/bin/
+COPY --from=rocm /opt/rocm-$ROCM/share/miopen/db/*$AMDGPU* /opt/rocm-$ROCM/share/miopen/db/
+COPY --from=rocm /opt/rocm-$ROCM/lib/rocblas/library/*$AMDGPU* /opt/rocm-$ROCM/lib/rocblas/library/
+COPY --from=rocm /opt/rocm-dist/ /
+COPY --from=debian-build /opt/rocm/lib/migraphx.cpython-39-x86_64-linux-gnu.so /opt/rocm-$ROCM/lib/
+
+#######################################################################
+FROM deps-prelim AS rocm-prelim-hsa-override0
+
+ENV HSA_ENABLE_SDMA=0
+
+COPY --from=rocm-dist / /
+
+RUN ldconfig
+
+#######################################################################
+FROM rocm-prelim-hsa-override0 as rocm-prelim-hsa-override1
+
+ARG HSA_OVERRIDE_GFX_VERSION
+ENV HSA_OVERRIDE_GFX_VERSION=$HSA_OVERRIDE_GFX_VERSION
+
+#######################################################################
+FROM rocm-prelim-hsa-override$HSA_OVERRIDE as rocm-deps
+
+# Request yolov8 download at startup
+ENV DOWNLOAD_YOLOV8=1
+
--- a/docker/rocm/migraphx/CMakeLists.txt
+++ b/docker/rocm/migraphx/CMakeLists.txt
@ -0,0 +1,26 @@
+
+cmake_minimum_required(VERSION 3.1)
+
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CXX_EXTENSIONS OFF)
+
+if(NOT CMAKE_BUILD_TYPE)
+  set(CMAKE_BUILD_TYPE Release)
+endif()
+
+SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
+
+project(migraphx_py)
+
+include_directories(/opt/rocm/include)
+
+find_package(pybind11 REQUIRED)
+pybind11_add_module(migraphx migraphx_py.cpp)
+
+target_link_libraries(migraphx PRIVATE /opt/rocm/lib/libmigraphx.so /opt/rocm/lib/libmigraphx_tf.so /opt/rocm/lib/libmigraphx_onnx.so)
+
+install(TARGETS migraphx
+  COMPONENT python
+  LIBRARY DESTINATION /opt/rocm/lib
+)
--- a/docker/rocm/migraphx/migraphx_py.cpp
+++ b/docker/rocm/migraphx/migraphx_py.cpp
@ -0,0 +1,582 @@
+/*
+ * The MIT License (MIT)
+ *
+ * Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <pybind11/pybind11.h>
+#include <pybind11/stl.h>
+#include <pybind11/numpy.h>
+#include <migraphx/program.hpp>
+#include <migraphx/instruction_ref.hpp>
+#include <migraphx/operation.hpp>
+#include <migraphx/quantization.hpp>
+#include <migraphx/generate.hpp>
+#include <migraphx/instruction.hpp>
+#include <migraphx/ref/target.hpp>
+#include <migraphx/stringutils.hpp>
+#include <migraphx/tf.hpp>
+#include <migraphx/onnx.hpp>
+#include <migraphx/load_save.hpp>
+#include <migraphx/register_target.hpp>
+#include <migraphx/json.hpp>
+#include <migraphx/make_op.hpp>
+#include <migraphx/op/common.hpp>
+
+#ifdef HAVE_GPU
+#include <migraphx/gpu/hip.hpp>
+#endif
+
+using half   = half_float::half;
+namespace py = pybind11;
+
+#ifdef __clang__
+#define MIGRAPHX_PUSH_UNUSED_WARNING \
+    _Pragma("clang diagnostic push") \
+        _Pragma("clang diagnostic ignored \"-Wused-but-marked-unused\"")
+#define MIGRAPHX_POP_WARNING _Pragma("clang diagnostic pop")
+#else
+#define MIGRAPHX_PUSH_UNUSED_WARNING
+#define MIGRAPHX_POP_WARNING
+#endif
+#define MIGRAPHX_PYBIND11_MODULE(...) \
+    MIGRAPHX_PUSH_UNUSED_WARNING      \
+    PYBIND11_MODULE(__VA_ARGS__)      \
+    MIGRAPHX_POP_WARNING
+
+#define MIGRAPHX_PYTHON_GENERATE_SHAPE_ENUM(x, t) .value(#x, migraphx::shape::type_t::x)
+namespace migraphx {
+
+migraphx::value to_value(py::kwargs kwargs);
+migraphx::value to_value(py::list lst);
+
+template <class T, class F>
+void visit_py(T x, F f)
+{
+    if(py::isinstance<py::kwargs>(x))
+    {
+        f(to_value(x.template cast<py::kwargs>()));
+    }
+    else if(py::isinstance<py::list>(x))
+    {
+        f(to_value(x.template cast<py::list>()));
+    }
+    else if(py::isinstance<py::bool_>(x))
+    {
+        f(x.template cast<bool>());
+    }
+    else if(py::isinstance<py::int_>(x) or py::hasattr(x, "__index__"))
+    {
+        f(x.template cast<int>());
+    }
+    else if(py::isinstance<py::float_>(x))
+    {
+        f(x.template cast<float>());
+    }
+    else if(py::isinstance<py::str>(x))
+    {
+        f(x.template cast<std::string>());
+    }
+    else if(py::isinstance<migraphx::shape::dynamic_dimension>(x))
+    {
+        f(migraphx::to_value(x.template cast<migraphx::shape::dynamic_dimension>()));
+    }
+    else
+    {
+        MIGRAPHX_THROW("VISIT_PY: Unsupported data type!");
+    }
+}
+
+migraphx::value to_value(py::list lst)
+{
+    migraphx::value v = migraphx::value::array{};
+    for(auto val : lst)
+    {
+        visit_py(val, [&](auto py_val) { v.push_back(py_val); });
+    }
+
+    return v;
+}
+
+migraphx::value to_value(py::kwargs kwargs)
+{
+    migraphx::value v = migraphx::value::object{};
+
+    for(auto arg : kwargs)
+    {
+        auto&& key = py::str(arg.first);
+        auto&& val = arg.second;
+        visit_py(val, [&](auto py_val) { v[key] = py_val; });
+    }
+    return v;
+}
+} // namespace migraphx
+
+namespace pybind11 {
+namespace detail {
+
+template <>
+struct npy_format_descriptor<half>
+{
+    static std::string format()
+    {
+        // following: https://docs.python.org/3/library/struct.html#format-characters
+        return "e";
+    }
+    static constexpr auto name() { return _("half"); }
+};
+
+} // namespace detail
+} // namespace pybind11
+
+template <class F>
+void visit_type(const migraphx::shape& s, F f)
+{
+    s.visit_type(f);
+}
+
+template <class T, class F>
+void visit(const migraphx::raw_data<T>& x, F f)
+{
+    x.visit(f);
+}
+
+template <class F>
+void visit_types(F f)
+{
+    migraphx::shape::visit_types(f);
+}
+
+template <class T>
+py::buffer_info to_buffer_info(T& x)
+{
+    migraphx::shape s = x.get_shape();
+    assert(s.type() != migraphx::shape::tuple_type);
+    if(s.dynamic())
+        MIGRAPHX_THROW("MIGRAPHX PYTHON: dynamic shape argument passed to to_buffer_info");
+    auto strides = s.strides();
+    std::transform(
+        strides.begin(), strides.end(), strides.begin(), [&](auto i) { return i * s.type_size(); });
+    py::buffer_info b;
+    visit_type(s, [&](auto as) {
+        // migraphx use int8_t data to store bool type, we need to
+        // explicitly specify the data type as bool for python
+        if(s.type() == migraphx::shape::bool_type)
+        {
+            b = py::buffer_info(x.data(),
+                                as.size(),
+                                py::format_descriptor<bool>::format(),
+                                s.ndim(),
+                                s.lens(),
+                                strides);
+        }
+        else
+        {
+            b = py::buffer_info(x.data(),
+                                as.size(),
+                                py::format_descriptor<decltype(as())>::format(),
+                                s.ndim(),
+                                s.lens(),
+                                strides);
+        }
+    });
+    return b;
+}
+
+migraphx::shape to_shape(const py::buffer_info& info)
+{
+    migraphx::shape::type_t t;
+    std::size_t n = 0;
+    visit_types([&](auto as) {
+        if(info.format == py::format_descriptor<decltype(as())>::format() or
+           (info.format == "l" and py::format_descriptor<decltype(as())>::format() == "q") or
+           (info.format == "L" and py::format_descriptor<decltype(as())>::format() == "Q"))
+        {
+            t = as.type_enum();
+            n = sizeof(as());
+        }
+        else if(info.format == "?" and py::format_descriptor<decltype(as())>::format() == "b")
+        {
+            t = migraphx::shape::bool_type;
+            n = sizeof(bool);
+        }
+    });
+
+    if(n == 0)
+    {
+        MIGRAPHX_THROW("MIGRAPHX PYTHON: Unsupported data type " + info.format);
+    }
+
+    auto strides = info.strides;
+    std::transform(strides.begin(), strides.end(), strides.begin(), [&](auto i) -> std::size_t {
+        return n > 0 ? i / n : 0;
+    });
+
+    // scalar support
+    if(info.shape.empty())
+    {
+        return migraphx::shape{t};
+    }
+    else
+    {
+        return migraphx::shape{t, info.shape, strides};
+    }
+}
+
+MIGRAPHX_PYBIND11_MODULE(migraphx, m)
+{
+    py::class_<migraphx::shape> shape_cls(m, "shape");
+    shape_cls
+        .def(py::init([](py::kwargs kwargs) {
+            auto v = migraphx::to_value(kwargs);
+            auto t = migraphx::shape::parse_type(v.get("type", "float"));
+            if(v.contains("dyn_dims"))
+            {
+                auto dyn_dims =
+                    migraphx::from_value<std::vector<migraphx::shape::dynamic_dimension>>(
+                        v.at("dyn_dims"));
+                return migraphx::shape(t, dyn_dims);
+            }
+            auto lens = v.get<std::size_t>("lens", {1});
+            if(v.contains("strides"))
+                return migraphx::shape(t, lens, v.at("strides").to_vector<std::size_t>());
+            else
+                return migraphx::shape(t, lens);
+        }))
+        .def("type", &migraphx::shape::type)
+        .def("lens", &migraphx::shape::lens)
+        .def("strides", &migraphx::shape::strides)
+        .def("ndim", &migraphx::shape::ndim)
+        .def("elements", &migraphx::shape::elements)
+        .def("bytes", &migraphx::shape::bytes)
+        .def("type_string", &migraphx::shape::type_string)
+        .def("type_size", &migraphx::shape::type_size)
+        .def("dyn_dims", &migraphx::shape::dyn_dims)
+        .def("packed", &migraphx::shape::packed)
+        .def("transposed", &migraphx::shape::transposed)
+        .def("broadcasted", &migraphx::shape::broadcasted)
+        .def("standard", &migraphx::shape::standard)
+        .def("scalar", &migraphx::shape::scalar)
+        .def("dynamic", &migraphx::shape::dynamic)
+        .def("__eq__", std::equal_to<migraphx::shape>{})
+        .def("__ne__", std::not_equal_to<migraphx::shape>{})
+        .def("__repr__", [](const migraphx::shape& s) { return migraphx::to_string(s); });
+
+    py::enum_<migraphx::shape::type_t>(shape_cls, "type_t")
+        MIGRAPHX_SHAPE_VISIT_TYPES(MIGRAPHX_PYTHON_GENERATE_SHAPE_ENUM);
+
+    py::class_<migraphx::shape::dynamic_dimension>(shape_cls, "dynamic_dimension")
+        .def(py::init<>())
+        .def(py::init<std::size_t, std::size_t>())
+        .def(py::init<std::size_t, std::size_t, std::set<std::size_t>>())
+        .def_readwrite("min", &migraphx::shape::dynamic_dimension::min)
+        .def_readwrite("max", &migraphx::shape::dynamic_dimension::max)
+        .def_readwrite("optimals", &migraphx::shape::dynamic_dimension::optimals)
+        .def("is_fixed", &migraphx::shape::dynamic_dimension::is_fixed);
+
+    py::class_<migraphx::argument>(m, "argument", py::buffer_protocol())
+        .def_buffer([](migraphx::argument& x) -> py::buffer_info { return to_buffer_info(x); })
+        .def(py::init([](py::buffer b) {
+            py::buffer_info info = b.request();
+            return migraphx::argument(to_shape(info), info.ptr);
+        }))
+        .def("get_shape", &migraphx::argument::get_shape)
+        .def("data_ptr",
+             [](migraphx::argument& x) { return reinterpret_cast<std::uintptr_t>(x.data()); })
+        .def("tolist",
+             [](migraphx::argument& x) {
+                 py::list l{x.get_shape().elements()};
+                 visit(x, [&](auto data) { l = py::cast(data.to_vector()); });
+                 return l;
+             })
+        .def("__eq__", std::equal_to<migraphx::argument>{})
+        .def("__ne__", std::not_equal_to<migraphx::argument>{})
+        .def("__repr__", [](const migraphx::argument& x) { return migraphx::to_string(x); });
+
+    py::class_<migraphx::target>(m, "target");
+
+    py::class_<migraphx::instruction_ref>(m, "instruction_ref")
+        .def("shape", [](migraphx::instruction_ref i) { return i->get_shape(); })
+        .def("op", [](migraphx::instruction_ref i) { return i->get_operator(); });
+
+    py::class_<migraphx::module, std::unique_ptr<migraphx::module, py::nodelete>>(m, "module")
+        .def("print", [](const migraphx::module& mm) { std::cout << mm << std::endl; })
+        .def(
+            "add_instruction",
+            [](migraphx::module& mm,
+               const migraphx::operation& op,
+               std::vector<migraphx::instruction_ref>& args,
+               std::vector<migraphx::module*>& mod_args) {
+                return mm.add_instruction(op, args, mod_args);
+            },
+            py::arg("op"),
+            py::arg("args"),
+            py::arg("mod_args") = std::vector<migraphx::module*>{})
+        .def(
+            "add_literal",
+            [](migraphx::module& mm, py::buffer data) {
+                py::buffer_info info = data.request();
+                auto literal_shape   = to_shape(info);
+                return mm.add_literal(literal_shape, reinterpret_cast<char*>(info.ptr));
+            },
+            py::arg("data"))
+        .def(
+            "add_parameter",
+            [](migraphx::module& mm, const std::string& name, const migraphx::shape shape) {
+                return mm.add_parameter(name, shape);
+            },
+            py::arg("name"),
+            py::arg("shape"))
+        .def(
+            "add_return",
+            [](migraphx::module& mm, std::vector<migraphx::instruction_ref>& args) {
+                return mm.add_return(args);
+            },
+            py::arg("args"))
+        .def("__repr__", [](const migraphx::module& mm) { return migraphx::to_string(mm); });
+
+    py::class_<migraphx::program>(m, "program")
+        .def(py::init([]() { return migraphx::program(); }))
+        .def("get_parameter_names", &migraphx::program::get_parameter_names)
+        .def("get_parameter_shapes", &migraphx::program::get_parameter_shapes)
+        .def("get_output_shapes", &migraphx::program::get_output_shapes)
+        .def("is_compiled", &migraphx::program::is_compiled)
+        .def(
+            "compile",
+            [](migraphx::program& p,
+               const migraphx::target& t,
+               bool offload_copy,
+               bool fast_math,
+               bool exhaustive_tune) {
+                migraphx::compile_options options;
+                options.offload_copy    = offload_copy;
+                options.fast_math       = fast_math;
+                options.exhaustive_tune = exhaustive_tune;
+                p.compile(t, options);
+            },
+            py::arg("t"),
+            py::arg("offload_copy")    = true,
+            py::arg("fast_math")       = true,
+            py::arg("exhaustive_tune") = false)
+        .def("get_main_module", [](const migraphx::program& p) { return p.get_main_module(); })
+        .def(
+            "create_module",
+            [](migraphx::program& p, const std::string& name) { return p.create_module(name); },
+            py::arg("name"))
+        .def("run",
+             [](migraphx::program& p, py::dict params) {
+                 migraphx::parameter_map pm;
+                 for(auto x : params)
+                 {
+                     std::string key      = x.first.cast<std::string>();
+                     py::buffer b         = x.second.cast<py::buffer>();
+                     py::buffer_info info = b.request();
+                     pm[key]              = migraphx::argument(to_shape(info), info.ptr);
+                 }
+                 return p.eval(pm);
+             })
+        .def("run_async",
+             [](migraphx::program& p,
+                py::dict params,
+                std::uintptr_t stream,
+                std::string stream_name) {
+                 migraphx::parameter_map pm;
+                 for(auto x : params)
+                 {
+                     std::string key      = x.first.cast<std::string>();
+                     py::buffer b         = x.second.cast<py::buffer>();
+                     py::buffer_info info = b.request();
+                     pm[key]              = migraphx::argument(to_shape(info), info.ptr);
+                 }
+                 migraphx::execution_environment exec_env{
+                     migraphx::any_ptr(reinterpret_cast<void*>(stream), stream_name), true};
+                 return p.eval(pm, exec_env);
+             })
+        .def("sort", &migraphx::program::sort)
+        .def("print", [](const migraphx::program& p) { std::cout << p << std::endl; })
+        .def("__eq__", std::equal_to<migraphx::program>{})
+        .def("__ne__", std::not_equal_to<migraphx::program>{})
+        .def("__repr__", [](const migraphx::program& p) { return migraphx::to_string(p); });
+
+    py::class_<migraphx::operation> op(m, "op");
+    op.def(py::init([](const std::string& name, py::kwargs kwargs) {
+          migraphx::value v = migraphx::value::object{};
+          if(kwargs)
+          {
+              v = migraphx::to_value(kwargs);
+          }
+          return migraphx::make_op(name, v);
+      }))
+        .def("name", &migraphx::operation::name);
+
+    py::enum_<migraphx::op::pooling_mode>(op, "pooling_mode")
+        .value("average", migraphx::op::pooling_mode::average)
+        .value("max", migraphx::op::pooling_mode::max)
+        .value("lpnorm", migraphx::op::pooling_mode::lpnorm);
+
+    py::enum_<migraphx::op::rnn_direction>(op, "rnn_direction")
+        .value("forward", migraphx::op::rnn_direction::forward)
+        .value("reverse", migraphx::op::rnn_direction::reverse)
+        .value("bidirectional", migraphx::op::rnn_direction::bidirectional);
+
+    m.def(
+        "argument_from_pointer",
+        [](const migraphx::shape shape, const int64_t address) {
+            return migraphx::argument(shape, reinterpret_cast<void*>(address));
+        },
+        py::arg("shape"),
+        py::arg("address"));
+
+    m.def(
+        "parse_tf",
+        [](const std::string& filename,
+           bool is_nhwc,
+           unsigned int batch_size,
+           std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
+           std::vector<std::string> output_names) {
+            return migraphx::parse_tf(
+                filename, migraphx::tf_options{is_nhwc, batch_size, map_input_dims, output_names});
+        },
+        "Parse tf protobuf (default format is nhwc)",
+        py::arg("filename"),
+        py::arg("is_nhwc")        = true,
+        py::arg("batch_size")     = 1,
+        py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
+        py::arg("output_names")   = std::vector<std::string>());
+
+    m.def(
+        "parse_onnx",
+        [](const std::string& filename,
+           unsigned int default_dim_value,
+           migraphx::shape::dynamic_dimension default_dyn_dim_value,
+           std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
+           std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>
+               map_dyn_input_dims,
+           bool skip_unknown_operators,
+           bool print_program_on_error,
+           int64_t max_loop_iterations) {
+            migraphx::onnx_options options;
+            options.default_dim_value      = default_dim_value;
+            options.default_dyn_dim_value  = default_dyn_dim_value;
+            options.map_input_dims         = map_input_dims;
+            options.map_dyn_input_dims     = map_dyn_input_dims;
+            options.skip_unknown_operators = skip_unknown_operators;
+            options.print_program_on_error = print_program_on_error;
+            options.max_loop_iterations    = max_loop_iterations;
+            return migraphx::parse_onnx(filename, options);
+        },
+        "Parse onnx file",
+        py::arg("filename"),
+        py::arg("default_dim_value")     = 0,
+        py::arg("default_dyn_dim_value") = migraphx::shape::dynamic_dimension{1, 1},
+        py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
+        py::arg("map_dyn_input_dims") =
+            std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>(),
+        py::arg("skip_unknown_operators") = false,
+        py::arg("print_program_on_error") = false,
+        py::arg("max_loop_iterations")    = 10);
+
+    m.def(
+        "parse_onnx_buffer",
+        [](const std::string& onnx_buffer,
+           unsigned int default_dim_value,
+           migraphx::shape::dynamic_dimension default_dyn_dim_value,
+           std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
+           std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>
+               map_dyn_input_dims,
+           bool skip_unknown_operators,
+           bool print_program_on_error) {
+            migraphx::onnx_options options;
+            options.default_dim_value      = default_dim_value;
+            options.default_dyn_dim_value  = default_dyn_dim_value;
+            options.map_input_dims         = map_input_dims;
+            options.map_dyn_input_dims     = map_dyn_input_dims;
+            options.skip_unknown_operators = skip_unknown_operators;
+            options.print_program_on_error = print_program_on_error;
+            return migraphx::parse_onnx_buffer(onnx_buffer, options);
+        },
+        "Parse onnx file",
+        py::arg("filename"),
+        py::arg("default_dim_value")     = 0,
+        py::arg("default_dyn_dim_value") = migraphx::shape::dynamic_dimension{1, 1},
+        py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
+        py::arg("map_dyn_input_dims") =
+            std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>(),
+        py::arg("skip_unknown_operators") = false,
+        py::arg("print_program_on_error") = false);
+
+    m.def(
+        "load",
+        [](const std::string& name, const std::string& format) {
+            migraphx::file_options options;
+            options.format = format;
+            return migraphx::load(name, options);
+        },
+        "Load MIGraphX program",
+        py::arg("filename"),
+        py::arg("format") = "msgpack");
+
+    m.def(
+        "save",
+        [](const migraphx::program& p, const std::string& name, const std::string& format) {
+            migraphx::file_options options;
+            options.format = format;
+            return migraphx::save(p, name, options);
+        },
+        "Save MIGraphX program",
+        py::arg("p"),
+        py::arg("filename"),
+        py::arg("format") = "msgpack");
+
+    m.def("get_target", &migraphx::make_target);
+    m.def("create_argument", [](const migraphx::shape& s, const std::vector<double>& values) {
+        if(values.size() != s.elements())
+            MIGRAPHX_THROW("Values and shape elements do not match");
+        migraphx::argument a{s};
+        a.fill(values.begin(), values.end());
+        return a;
+    });
+    m.def("generate_argument", &migraphx::generate_argument, py::arg("s"), py::arg("seed") = 0);
+    m.def("fill_argument", &migraphx::fill_argument, py::arg("s"), py::arg("value"));
+    m.def("quantize_fp16",
+          &migraphx::quantize_fp16,
+          py::arg("prog"),
+          py::arg("ins_names") = std::vector<std::string>{"all"});
+    m.def("quantize_int8",
+          &migraphx::quantize_int8,
+          py::arg("prog"),
+          py::arg("t"),
+          py::arg("calibration") = std::vector<migraphx::parameter_map>{},
+          py::arg("ins_names")   = std::vector<std::string>{"dot", "convolution"});
+
+#ifdef HAVE_GPU
+    m.def("allocate_gpu", &migraphx::gpu::allocate_gpu, py::arg("s"), py::arg("host") = false);
+    m.def("to_gpu", &migraphx::gpu::to_gpu, py::arg("arg"), py::arg("host") = false);
+    m.def("from_gpu", &migraphx::gpu::from_gpu);
+    m.def("gpu_sync", [] { migraphx::gpu::gpu_sync(); });
+#endif
+
+#ifdef VERSION_INFO
+    m.attr("__version__") = VERSION_INFO;
+#else
+    m.attr("__version__") = "dev";
+#endif
+}
--- a/docker/rocm/rocm-pin-600
+++ b/docker/rocm/rocm-pin-600
@ -0,0 +1,3 @@
+Package: *
+Pin: release o=repo.radeon.com
+Pin-Priority: 600
--- a/docker/rocm/rocm.hcl
+++ b/docker/rocm/rocm.hcl
@ -0,0 +1,38 @@
+variable "AMDGPU" {
+  default = "gfx900"
+}
+variable "ROCM" {
+  default = "5.7.3"
+}
+variable "HSA_OVERRIDE_GFX_VERSION" {
+  default = ""
+}
+variable "HSA_OVERRIDE" {
+  default = "1"
+}
+target deps {
+  dockerfile = "docker/main/Dockerfile"
+  platforms = ["linux/amd64"]
+  target = "deps"
+}
+
+target rootfs {
+  dockerfile = "docker/main/Dockerfile"
+  platforms = ["linux/amd64"]
+  target = "rootfs"
+}
+
+target rocm {
+  dockerfile = "docker/rocm/Dockerfile"
+  contexts = {
+    deps = "target:deps",
+    rootfs = "target:rootfs"
+  }
+  platforms = ["linux/amd64"]
+  args = {
+    AMDGPU = AMDGPU,
+    ROCM = ROCM,
+    HSA_OVERRIDE_GFX_VERSION = HSA_OVERRIDE_GFX_VERSION,
+    HSA_OVERRIDE = HSA_OVERRIDE
+  }
+}
--- a/docker/rocm/rocm.list
+++ b/docker/rocm/rocm.list
@ -0,0 +1 @@
+deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7.3 focal main
--- a/docker/rocm/rocm.mk
+++ b/docker/rocm/rocm.mk
@ -0,0 +1,17 @@
+BOARDS += rocm
+
+# AMD/ROCm is chunky so we build couple of smaller images for specific chipsets
+ROCM_CHIPSETS:=gfx900:9.0.0 gfx1030:10.3.0 gfx1100:11.0.0
+
+local-rocm: version
+	$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --load --file=docker/rocm/rocm.hcl --set rocm.tags=frigate:latest-rocm-$(word 1,$(subst :, ,$(chipset))) rocm;)
+	unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --load --file=docker/rocm/rocm.hcl --set rocm.tags=frigate:latest-rocm rocm
+
+build-rocm: version
+	$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm-$(chipset) rocm;)
+	unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm rocm
+
+push-rocm: build-rocm
+	$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --push --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm-$(chipset) rocm;)
+	unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --push --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm rocm
+
--- a/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/dependencies.d/download-models
+++ b/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/dependencies.d/download-models
--- a/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/run
+++ b/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/run
@ -0,0 +1,20 @@
+#!/command/with-contenv bash
+# shellcheck shell=bash
+# Compile YoloV8 ONNX files into ROCm MIGraphX files
+
+OVERRIDE=$(cd /opt/frigate && python3 -c 'import frigate.detectors.plugins.rocm as rocm; print(rocm.auto_override_gfx_version())')
+
+if ! test -z "$OVERRIDE"; then
+  echo "Using HSA_OVERRIDE_GFX_VERSION=${OVERRIDE}"
+  export HSA_OVERRIDE_GFX_VERSION=$OVERRIDE
+fi
+
+for onnx in /config/model_cache/yolov8/*.onnx
+do
+  mxr="${onnx%.onnx}.mxr"
+  if ! test -f $mxr; then
+    echo "processing $onnx into $mxr"
+    /opt/rocm/bin/migraphx-driver compile $onnx --optimize --gpu --enable-offload-copy --binary -o $mxr
+  fi
+done
+
--- a/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/type
+++ b/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/type
@ -0,0 +1 @@
+oneshot
--- a/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/up
+++ b/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/compile-rocm-models/up
@ -0,0 +1 @@
+/etc/s6-overlay/s6-rc.d/compile-rocm-models/run
--- a/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/frigate/dependencies.d/compile-rocm-models
+++ b/docker/rocm/rootfs/etc/s6-overlay/s6-rc.d/frigate/dependencies.d/compile-rocm-models
--- a/docs/docs/configuration/object_detectors.md
+++ b/docs/docs/configuration/object_detectors.md
@ -397,3 +397,158 @@ detectors:
 ```

 :::
+
+## AMD/ROCm GPU detector
+
+### Setup
+
+The `rocm` detector supports running [ultralytics](https://github.com/ultralytics/ultralytics) yolov8 models on AMD GPUs and iGPUs. Use a frigate docker image with `-rocm` suffix, for example `ghcr.io/blakeblackshear/frigate:stable-rocm`.
+
+As the ROCm software stack is quite bloated, there are also smaller versions for specific GPU chipsets:
+
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx900`
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1030`
+- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1100`
+
+### Docker settings for GPU access
+
+ROCm needs access to the `/dev/kfd` and `/dev/dri` devices. When docker or frigate is not run under root then also `video` (and possibly `render` and `ssl/_ssl`) groups should be added.
+
+When running docker directly the following flags should be added for device access:
+
+```bash
+$ docker run --device=/dev/kfd --device=/dev/dri  \
+    ...
+```
+
+When using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    devices:
+      - /dev/dri
+      - /dev/kfd
+...
+```
+
+For reference on recommended settings see [running ROCm/pytorch in Docker](https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html#using-docker-with-pytorch-pre-installed).
+
+### Docker settings for overriding the GPU chipset
+
+Your GPU or iGPU might work just fine without any special configuration but in many cases they need manual settings. AMD/ROCm software stack comes with a limited set of GPU drivers and for newer or missing models you will have to override the chipset version to an older/generic version to get things working.
+
+Also AMD/ROCm does not "officially" support integrated GPUs. It still does work with most of them just fine but requires special settings. One has to configure the `HSA_OVERRIDE_GFX_VERSION` environment variable. See the [ROCm bug report](https://github.com/ROCm/ROCm/issues/1743) for context and examples.
+
+For chipset specific frigate rocm builds this variable is already set automatically.
+
+For the general rocm frigate build there is some automatic detection:
+
+  - gfx90c -> 9.0.0
+  - gfx1031 -> 10.3.0
+  - gfx1103 -> 11.0.0
+
+If you have something else you might need to override the `HSA_OVERRIDE_GFX_VERSION` at Docker launch. Suppose the version you want is `9.0.0`, then you should configure it from command line as:
+
+```bash
+$ docker run -e HSA_OVERRIDE_GFX_VERSION=9.0.0 \
+    ...
+```
+
+When using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    environment:
+      HSA_OVERRIDE_GFX_VERSION: "9.0.0"
+```
+
+Figuring out what version you need can be complicated as you can't tell the chipset name and driver from the AMD brand name.
+
+  - first make sure that rocm environment is running properly by running `/opt/rocm/bin/rocminfo` in the frigate container -- it should list both the CPU and the GPU with their properties
+  - find the chipset version you have (gfxNNN) from the output of the `rocminfo` (see below)
+  - use a search engine to query what `HSA_OVERRIDE_GFX_VERSION` you need for the given gfx name ("gfxNNN ROCm HSA_OVERRIDE_GFX_VERSION")
+  - override the `HSA_OVERRIDE_GFX_VERSION` with relevant value
+  - if things are not working check the frigate docker logs
+
+#### Figuring out if AMD/ROCm is working and found your GPU
+
+```bash
+$ docker exec -it frigate /opt/rocm/bin/rocminfo
+```
+
+#### Figuring out your AMD GPU chipset version:
+
+We unset the `HSA_OVERRIDE_GFX_VERSION` to prevent an existing override from messing up the result:
+
+```bash
+$ docker exec -it frigate /bin/bash -c '(unset HSA_OVERRIDE_GFX_VERSION && /opt/rocm/bin/rocminfo |grep gfx)'
+```
+
+### Yolov8 model download and available files
+
+The ROCm specific frigate docker containers automatically download yolov8 files from https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ at startup --
+they fetch [yolov8.small.models.tar.gz](https://github.com/harakas/models/releases/download/yolov8.1-1.1/yolov8.small.models.tar.gz)
+and uncompresses it into the `/config/model_cache/yolov8/` directory. After that the model files are compiled for your GPU chipset.
+
+Both the download and compilation can take couple of minutes during which frigate will not be responsive. See docker logs for how it is progressing.
+
+Automatic model download can be configured with the `DOWNLOAD_YOLOV8=1/0` environment variable either from the command line
+
+```bash
+$ docker run ... -e DOWNLOAD_YOLOV8=1 \
+    ...
+```
+
+or when using docker compose:
+
+```yaml
+services:
+  frigate:
+...
+    environment:
+      DOWNLOAD_YOLOV8: "1"
+```
+
+Download can be triggered also in regular frigate builds using that environment variable. The following files will be available under `/config/model_cache/yolov8/`:
+
+- `yolov8[ns]_320x320.onnx` -- nano (n) and small (s) sized floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the coco dataset (90 classes)
+- `yolov8[ns]-oiv7_320x320.onnx` -- floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the google open images v7 dataset (601 classes)
+- `labels.txt` and `labels-frigate.txt` -- full and aggregated labels for the coco dataset models
+- `labels-oiv7.txt` and `labels-oiv7-frigate.txt` -- labels for the oiv7 dataset models
+
+The aggregated label files contain renamed labels leaving only `person`, `vehicle`, `animal` and `bird` classes. The oiv7 trained models contain 601 classes and so are difficult to configure manually -- using aggregate labels is recommended.
+
+Larger models (of `m` and `l` size and also at `640x640` resolution) can be found at https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ but have to be installed manually.
+
+The oiv7 models have been trained using a larger google open images v7 dataset. They also contain a lot more detection classes (over 600) so using aggregate label files is recommended. The large number of classes leads to lower baseline for detection probability values and also for higher resource consumption (they are slower to evaluate).
+
+The `rocm` builds precompile the `onnx` files for your chipset into `mxr` files. If you change your hardware or GPU or have compiled the wrong versions you need to delete the cached `.mxr` files under `/config/model_cache/yolov8/`.
+
+### Frigate configuration
+
+You also need to modify the frigate configuration to specify the detector, labels and model file. Here is an example configuration running `yolov8s`:
+
+```yaml
+model:
+  labelmap_path: /config/model_cache/yolov8/labels.txt
+  model_type: yolov8
+detectors:
+  rocm:
+    type: rocm
+    model:
+      path: /config/model_cache/yolov8/yolov8s_320x320.onnx
+```
+
+Other settings available for the rocm detector
+
+- `conserve_cpu: True` -- run ROCm/HIP synchronization in blocking mode saving CPU (at small loss of latency and maximum throughput)
+- `auto_override_gfx: True` -- enable or disable automatic gfx driver detection
+
+### Expected performance
+
+On an AMD Ryzen 3 5400U with integrated GPU (gfx90c) the yolov8n runs in around 9ms per image (about 110 detections per second) and 18ms (55 detections per second) for yolov8s (at 320x320 detector resolution).
+
--- a/docs/docs/frigate/hardware.md
+++ b/docs/docs/frigate/hardware.md
@ -105,6 +105,12 @@ Frigate supports SBCs with the following Rockchip SoCs:

 Using the yolov8n model and an Orange Pi 5 Plus with RK3588 SoC inference speeds vary between 20 - 25 ms.

+#### AMD GPUs and iGPUs
+
+With the [rocm](../configuration/object_detectors.md#amdrocm-gpu-detector) detector Frigate can take advantage of many AMD GPUs and iGPUs.
+
+An AMD Ryzen mini PC with AMD Ryzen 3 5400U iGPU takes about 9 ms to evaluate yolov8n.
+
 ## What does Frigate use the CPU for and what does it use a detector for? (ELI5 Version)

 This is taken from a [user question on reddit](https://www.reddit.com/r/homeassistant/comments/q8mgau/comment/hgqbxh5/?utm_source=share&utm_medium=web2x&context=3). Modified slightly for clarity.
--- a/docs/docs/frigate/installation.md
+++ b/docs/docs/frigate/installation.md
@ -150,6 +150,10 @@ The community supported docker image tags for the current stable version are:
 - `stable-tensorrt-jp5` - Frigate build optimized for nvidia Jetson devices running Jetpack 5
 - `stable-tensorrt-jp4` - Frigate build optimized for nvidia Jetson devices running Jetpack 4.6
 - `stable-rk` - Frigate build for SBCs with Rockchip SoC
+- `stable-rocm` - Frigate build for [AMD GPUs and iGPUs](../configuration/object_detectors.md#amdrocm-gpu-detector), all drivers
+  - `stable-rocm-gfx900` - AMD gfx900 driver only
+  - `stable-rocm-gfx1030` - AMD gfx1030 driver only
+  - `stable-rocm-gfx1100` - AMD gfx1100 driver only

 ## Home Assistant Addon

--- a/frigate/detectors/plugins/onnx.py
+++ b/frigate/detectors/plugins/onnx.py
@ -0,0 +1,65 @@
+import glob
+import logging
+
+import numpy as np
+from typing_extensions import Literal
+
+from frigate.detectors.detection_api import DetectionApi
+from frigate.detectors.detector_config import BaseDetectorConfig
+from frigate.detectors.util import preprocess, yolov8_postprocess
+
+logger = logging.getLogger(__name__)
+
+DETECTOR_KEY = "onnx"
+
+
+class ONNXDetectorConfig(BaseDetectorConfig):
+    type: Literal[DETECTOR_KEY]
+
+
+class ONNXDetector(DetectionApi):
+    type_key = DETECTOR_KEY
+
+    def __init__(self, detector_config: ONNXDetectorConfig):
+        try:
+            import onnxruntime
+
+            logger.info("ONNX: loaded onnxruntime module")
+        except ModuleNotFoundError:
+            logger.error(
+                "ONNX: module loading failed, need 'pip install onnxruntime'?!?"
+            )
+            raise
+
+        assert (
+            detector_config.model.model_type == "yolov8"
+        ), "ONNX: detector_config.model.model_type: only yolov8 supported"
+        assert (
+            detector_config.model.input_tensor == "nhwc"
+        ), "ONNX: detector_config.model.input_tensor: only nhwc supported"
+        if detector_config.model.input_pixel_format != "rgb":
+            logger.warn(
+                "ONNX: detector_config.model.input_pixel_format: should be 'rgb' for yolov8, but '{detector_config.model.input_pixel_format}' specified!"
+            )
+
+        assert detector_config.model.path is not None, (
+            "ONNX: No model.path configured, please configure model.path and model.labelmap_path; some suggestions: "
+            + ", ".join(glob.glob("/config/model_cache/yolov8/*.onnx"))
+            + " and "
+            + ", ".join(glob.glob("/config/model_cache/yolov8/*_labels.txt"))
+        )
+
+        path = detector_config.model.path
+        logger.info(f"ONNX: loading {detector_config.model.path}")
+        self.model = onnxruntime.InferenceSession(path)
+        logger.info(f"ONNX: {path} loaded")
+
+    def detect_raw(self, tensor_input):
+        model_input_name = self.model.get_inputs()[0].name
+        model_input_shape = self.model.get_inputs()[0].shape
+
+        tensor_input = preprocess(tensor_input, model_input_shape, np.float32)
+
+        tensor_output = self.model.run(None, {model_input_name: tensor_input})[0]
+
+        return yolov8_postprocess(model_input_shape, tensor_output)
--- a/frigate/detectors/plugins/rocm.py
+++ b/frigate/detectors/plugins/rocm.py
@ -0,0 +1,143 @@
+import ctypes
+import glob
+import logging
+import os
+import subprocess
+import sys
+
+import numpy as np
+from pydantic import Field
+from typing_extensions import Literal
+
+from frigate.detectors.detection_api import DetectionApi
+from frigate.detectors.detector_config import BaseDetectorConfig
+from frigate.detectors.util import preprocess, yolov8_postprocess
+
+logger = logging.getLogger(__name__)
+
+DETECTOR_KEY = "rocm"
+
+
+def detect_gfx_version():
+    return subprocess.getoutput(
+        "unset HSA_OVERRIDE_GFX_VERSION && /opt/rocm/bin/rocminfo | grep gfx |head -1|awk '{print $2}'"
+    )
+
+
+def auto_override_gfx_version():
+    # If environment varialbe already in place, do not override
+    gfx_version = detect_gfx_version()
+    old_override = os.getenv("HSA_OVERRIDE_GFX_VERSION")
+    if old_override not in (None, ""):
+        logger.warning(
+            f"AMD/ROCm: detected {gfx_version} but HSA_OVERRIDE_GFX_VERSION already present ({old_override}), not overriding!"
+        )
+        return old_override
+    mapping = {
+        "gfx90c": "9.0.0",
+        "gfx1031": "10.3.0",
+        "gfx1103": "11.0.0",
+    }
+    override = mapping.get(gfx_version)
+    if override is not None:
+        logger.warning(
+            f"AMD/ROCm: detected {gfx_version}, overriding HSA_OVERRIDE_GFX_VERSION={override}"
+        )
+        os.putenv("HSA_OVERRIDE_GFX_VERSION", override)
+        return override
+    return ""
+
+
+class ROCmDetectorConfig(BaseDetectorConfig):
+    type: Literal[DETECTOR_KEY]
+    conserve_cpu: bool = Field(
+        default=True,
+        title="Conserve CPU at the expense of latency (and reduced max throughput)",
+    )
+    auto_override_gfx: bool = Field(
+        default=True, title="Automatically detect and override gfx version"
+    )
+
+
+class ROCmDetector(DetectionApi):
+    type_key = DETECTOR_KEY
+
+    def __init__(self, detector_config: ROCmDetectorConfig):
+        if detector_config.auto_override_gfx:
+            auto_override_gfx_version()
+
+        try:
+            sys.path.append("/opt/rocm/lib")
+            import migraphx
+
+            logger.info("AMD/ROCm: loaded migraphx module")
+        except ModuleNotFoundError:
+            logger.error("AMD/ROCm: module loading failed, missing ROCm environment?")
+            raise
+
+        if detector_config.conserve_cpu:
+            logger.info("AMD/ROCm: switching HIP to blocking mode to conserve CPU")
+            ctypes.CDLL("/opt/rocm/lib/libamdhip64.so").hipSetDeviceFlags(4)
+        assert (
+            detector_config.model.model_type == "yolov8"
+        ), "AMD/ROCm: detector_config.model.model_type: only yolov8 supported"
+        assert (
+            detector_config.model.input_tensor == "nhwc"
+        ), "AMD/ROCm: detector_config.model.input_tensor: only nhwc supported"
+        if detector_config.model.input_pixel_format != "rgb":
+            logger.warn(
+                "AMD/ROCm: detector_config.model.input_pixel_format: should be 'rgb' for yolov8, but '{detector_config.model.input_pixel_format}' specified!"
+            )
+
+        assert detector_config.model.path is not None, (
+            "No model.path configured, please configure model.path and model.labelmap_path; some suggestions: "
+            + ", ".join(glob.glob("/config/model_cache/yolov8/*.onnx"))
+            + " and "
+            + ", ".join(glob.glob("/config/model_cache/yolov8/*_labels.txt"))
+        )
+
+        path = detector_config.model.path
+        mxr_path = os.path.splitext(path)[0] + ".mxr"
+        if path.endswith(".mxr"):
+            logger.info(f"AMD/ROCm: loading parsed model from {mxr_path}")
+            self.model = migraphx.load(mxr_path)
+        elif os.path.exists(mxr_path):
+            logger.info(f"AMD/ROCm: loading parsed model from {mxr_path}")
+            self.model = migraphx.load(mxr_path)
+        else:
+            logger.info(f"AMD/ROCm: loading model from {path}")
+            if path.endswith(".onnx"):
+                self.model = migraphx.parse_onnx(path)
+            elif (
+                path.endswith(".tf")
+                or path.endswith(".tf2")
+                or path.endswith(".tflite")
+            ):
+                # untested
+                self.model = migraphx.parse_tf(path)
+            else:
+                raise Exception(f"AMD/ROCm: unkown model format {path}")
+            logger.info("AMD/ROCm: compiling the model")
+            self.model.compile(
+                migraphx.get_target("gpu"), offload_copy=True, fast_math=True
+            )
+            logger.info(f"AMD/ROCm: saving parsed model into {mxr_path}")
+            os.makedirs("/config/model_cache/rocm", exist_ok=True)
+            migraphx.save(self.model, mxr_path)
+        logger.info("AMD/ROCm: model loaded")
+
+    def detect_raw(self, tensor_input):
+        model_input_name = self.model.get_parameter_names()[0]
+        model_input_shape = tuple(
+            self.model.get_parameter_shapes()[model_input_name].lens()
+        )
+        tensor_input = preprocess(tensor_input, model_input_shape, np.float32)
+
+        detector_result = self.model.run({model_input_name: tensor_input})[0]
+
+        addr = ctypes.cast(detector_result.data_ptr(), ctypes.POINTER(ctypes.c_float))
+        tensor_output = np.ctypeslib.as_array(
+            addr, shape=detector_result.get_shape().lens()
+        )
+
+        return yolov8_postprocess(model_input_shape, tensor_output)
--- a/frigate/detectors/util.py
+++ b/frigate/detectors/util.py
@ -0,0 +1,83 @@
+import logging
+
+import cv2
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+
+def preprocess(tensor_input, model_input_shape, model_input_element_type):
+    model_input_shape = tuple(model_input_shape)
+    assert tensor_input.dtype == np.uint8, f"tensor_input.dtype: {tensor_input.dtype}"
+    if len(tensor_input.shape) == 3:
+        tensor_input = tensor_input[np.newaxis, :]
+    if model_input_element_type == np.uint8:
+        # nothing to do for uint8 model input
+        assert (
+            model_input_shape == tensor_input.shape
+        ), f"model_input_shape: {model_input_shape}, tensor_input.shape: {tensor_input.shape}"
+        return tensor_input
+    assert (
+        model_input_element_type == np.float32
+    ), f"model_input_element_type: {model_input_element_type}"
+    # tensor_input must be nhwc
+    assert tensor_input.shape[3] == 3, f"tensor_input.shape: {tensor_input.shape}"
+    if tensor_input.shape[1:3] != model_input_shape[2:4]:
+        logger.warn(
+            f"preprocess: tensor_input.shape {tensor_input.shape} and model_input_shape {model_input_shape} do not match!"
+        )
+    # cv2.dnn.blobFromImage is faster than numpying it
+    return cv2.dnn.blobFromImage(
+        tensor_input[0],
+        1.0 / 255,
+        (model_input_shape[3], model_input_shape[2]),
+        None,
+        swapRB=False,
+    )
+
+
+def yolov8_postprocess(
+    model_input_shape,
+    tensor_output,
+    box_count=20,
+    score_threshold=0.5,
+    nms_threshold=0.5,
+):
+    model_box_count = tensor_output.shape[2]
+    probs = tensor_output[0, 4:, :]
+    all_ids = np.argmax(probs, axis=0)
+    all_confidences = probs.T[np.arange(model_box_count), all_ids]
+    all_boxes = tensor_output[0, 0:4, :].T
+    mask = all_confidences > score_threshold
+    class_ids = all_ids[mask]
+    confidences = all_confidences[mask]
+    cx, cy, w, h = all_boxes[mask].T
+
+    if model_input_shape[3] == 3:
+        scale_y, scale_x = 1 / model_input_shape[1], 1 / model_input_shape[2]
+    else:
+        scale_y, scale_x = 1 / model_input_shape[2], 1 / model_input_shape[3]
+    detections = np.stack(
+        (
+            class_ids,
+            confidences,
+            scale_y * (cy - h / 2),
+            scale_x * (cx - w / 2),
+            scale_y * (cy + h / 2),
+            scale_x * (cx + w / 2),
+        ),
+        axis=1,
+    )
+    if detections.shape[0] > box_count:
+        # if too many detections, do nms filtering to suppress overlapping boxes
+        boxes = np.stack((cx - w / 2, cy - h / 2, w, h), axis=1)
+        indexes = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
+        detections = detections[indexes]
+        # if still too many, trim the rest by confidence
+        if detections.shape[0] > box_count:
+            detections = detections[
+                np.argpartition(detections[:, 1], -box_count)[-box_count:]
+            ]
+        detections = detections.copy()
+    detections.resize((box_count, 6))
+    return detections
				`@ -0,0 +1 @@`
				`deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7.3 focal main`
				`@ -0,0 +1 @@`
				`/etc/s6-overlay/s6-rc.d/compile-rocm-models/run`