AMD GPU support with the rocm detector and YOLOv8 pretrained model download (#9762)

* ROCm AMD/GPU based build and detector, WIP

* detectors/rocm: separate yolov8 postprocessing into own function; fix box scaling; use cv2.dnn.blobForImage for preprocessing; assert on required model parameters

* AMD/ROCm: add couple of more ultralytics models; comments

* docker/rocm: make imported model files readable by all

* docker/rocm: readme about running on AMD GPUs

* docker/rocm: updated README

* docker/rocm: updated README

* docker/rocm: updated README

* detectors/rocm: separated preprocessing functions into yolo_utils.py

* detector/plugins: added onnx cpu plugin

* docker/rocm: updated container with limite label sets

* example detectors view

* docker/rocm: updated README.md

* docker/rocm: update README.md

* docker/rocm: do not set HSA_OVERRIDE_GFX_VERSION at all for the general version as the empty value broke rocm

* detectors: simplified/optimized yolov8_postprocess

* detector/yolo_utils: indentation, remove unused variable

* detectors/rocm: default option to conserve cpu usage at the expense of latency

* detectors/yolo_utils: use nms to prefilter overlapping boxes if too many detected

* detectors/edgetpu_tfl: add support for yolov8

* util/download_models: script to download yolov8 model files

* docker/main: add download-models overlay into s6 startup

* detectors/rocm: assume models are in /config/model_cache/yolov8/

* docker/rocm: compile onnx files into mxr files at startup

* switch model download into bash script

* detectors/rocm: automatically override HSA_OVERRIDE_GFX_VERSION for couple of known chipsets

* docs: rocm detector first notes

* typos

* describe builds (harakas temporary)

* docker/rocm: also build a version for gfx1100

* docker/rocm: use cp instead of tar

* docker.rocm: remove README as it is now in detector config

* frigate/detectors: renamed yolov8_preprocess->preprocess, pass input tensor element type

* docker/main: use newer openvino (2023.3.0)

* detectors: implement class aggregation

* update yolov8 model

* add openvino/yolov8 support for label aggregation

* docker: remove pointless s6/timeout-up files

* Revert "detectors: implement class aggregation"

This reverts commit dcfe6bbf6f.

* detectors/openvino: remove class aggregation

* detectors: increase yolov8 postprocessing score trershold to 0.5

* docker/rocm: separate rocm distributed files into its own build stage

* Update object_detectors.md

* updated CODEOWNERS file for rocm

* updated build names for documentation

* Revert "docker/main: use newer openvino (2023.3.0)"

This reverts commit dee95de908.

* reverrted openvino detector

* reverted edgetpu detector

* scratched rocm docs from any mention of edgetpu or openvino

* Update docs/docs/configuration/object_detectors.md

Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>

* renamed frigate.detectors.yolo_utils.py -> frigate.detectors.util.py

* clarified rocm example performance

* Improved wording and clarified text

* Mentioned rocm detector for AMD GPUs

* applied ruff formating

* applied ruff suggested fixes

* docker/rocm: fix missing argument resulting in larger docker image sizes

* docs/configuration/object_detectors: fix links to yolov8 release files

---------

Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
This commit is contained in:
harakas 2024-02-10 14:41:46 +02:00 committed by GitHub
parent 64988c9be0
commit 44d8cdbba1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
26 changed files with 1291 additions and 1 deletions

View File

@ -2,5 +2,5 @@
/docker/tensorrt/ @madsciencetist @NateMeyer
/docker/tensorrt/*arm64* @madsciencetist
/docker/tensorrt/*jetson* @madsciencetist
/docker/rockchip/ @MarcA711
/docker/rocm/ @harakas

View File

@ -196,6 +196,8 @@ EXPOSE 8555/tcp 8555/udp
# Configure logging to prepend timestamps, log to stdout, keep 0 archives and rotate on 10MB
ENV S6_LOGGING_SCRIPT="T 1 n0 s10000000 T"
# Do not fail on long-running download scripts
ENV S6_CMD_WAIT_FOR_SERVICES_MAXTIME=0
ENTRYPOINT ["/init"]
CMD []

View File

@ -25,6 +25,7 @@ norfair == 2.2.*
setproctitle == 1.3.*
ws4py == 0.5.*
unidecode == 1.3.*
onnxruntime == 1.16.*
# Openvino Library - Custom built with MYRIAD support
openvino @ https://github.com/NateMeyer/openvino-wheels/releases/download/multi-arch_2022.3.1/openvino-2022.3.1-1-cp39-cp39-manylinux_2_31_x86_64.whl; platform_machine == 'x86_64'
openvino @ https://github.com/NateMeyer/openvino-wheels/releases/download/multi-arch_2022.3.1/openvino-2022.3.1-1-cp39-cp39-linux_aarch64.whl; platform_machine == 'aarch64'

View File

@ -0,0 +1,34 @@
#!/command/with-contenv bash
# shellcheck shell=bash
# Download yolov8 models when DOWNLOAD_YOLOV8=1 environment variable is set
set -o errexit -o nounset -o pipefail
MODEL_CACHE_DIR=${MODEL_CACHE_DIR:-"/config/model_cache"}
YOLOV8_DIR="$MODEL_CACHE_DIR/yolov8"
YOLOV8_URL=https://github.com/harakas/models/releases/download/yolov8.1-1.1/yolov8.small.models.tar.gz
YOLOV8_DIGEST=304186b299560fbacc28eac9b9ea02cc2289fe30eb2c0df30109a2529423695c
if [ "$DOWNLOAD_YOLOV8" = "1" ]; then
echo "download-models: DOWNLOAD_YOLOV8=${DOWNLOAD_YOLOV8}, running download"
if ! test -f "${YOLOV8_DIR}/model.fetched"; then
mkdir -p $YOLOV8_DIR
TMP_FILE="${YOLOV8_DIR}/download.tar.gz"
curl --no-progress-meter -L --max-filesize 500M --insecure --output $TMP_FILE "${YOLOV8_URL}"
digest=$(sha256sum $TMP_FILE | awk '{print $1}')
if [ "$digest" = "$YOLOV8_DIGEST" ]; then
echo "download-models: Extracting downloaded file"
cd $YOLOV8_DIR
tar zxf $TMP_FILE
rm $TMP_FILE
touch model.fetched
echo "download-models: Yolov8 download done, files placed into ${YOLOV8_DIR}"
else
echo "download-models: Downloaded file digest does not match: got $digest, expected $YOLOV8_DIGEST"
rm $TMP_FILE
fi
else
echo "download-models: ${YOLOV8_DIR}/model.fetched already present"
fi
fi

View File

@ -0,0 +1 @@
oneshot

View File

@ -0,0 +1 @@
/etc/s6-overlay/s6-rc.d/download-models/run

106
docker/rocm/Dockerfile Normal file
View File

@ -0,0 +1,106 @@
# syntax=docker/dockerfile:1.4
# https://askubuntu.com/questions/972516/debian-frontend-environment-variable
ARG DEBIAN_FRONTEND=noninteractive
ARG ROCM=5.7.3
ARG AMDGPU=gfx900
ARG HSA_OVERRIDE_GFX_VERSION
ARG HSA_OVERRIDE
#######################################################################
FROM ubuntu:focal as rocm
ARG ROCM
RUN apt-get update && apt-get -y upgrade
RUN apt-get -y install gnupg wget
RUN mkdir --parents --mode=0755 /etc/apt/keyrings
RUN wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | tee /etc/apt/keyrings/rocm.gpg > /dev/null
COPY docker/rocm/rocm.list /etc/apt/sources.list.d/
COPY docker/rocm/rocm-pin-600 /etc/apt/preferences.d/
RUN apt-get update
RUN apt-get -y install --no-install-recommends migraphx
RUN apt-get -y install --no-install-recommends migraphx-dev
RUN mkdir -p /opt/rocm-dist/opt/rocm-$ROCM/lib
RUN cd /opt/rocm-$ROCM/lib && cp -dpr libMIOpen*.so* libamd*.so* libhip*.so* libhsa*.so* libmigraphx*.so* librocm*.so* librocblas*.so* /opt/rocm-dist/opt/rocm-$ROCM/lib/
RUN cd /opt/rocm-dist/opt/ && ln -s rocm-$ROCM rocm
RUN mkdir -p /opt/rocm-dist/etc/ld.so.conf.d/
RUN echo /opt/rocm/lib|tee /opt/rocm-dist/etc/ld.so.conf.d/rocm.conf
#######################################################################
FROM --platform=linux/amd64 debian:11 as debian-base
RUN apt-get update && apt-get -y upgrade
RUN apt-get -y install --no-install-recommends libelf1 libdrm2 libdrm-amdgpu1 libnuma1 kmod
RUN apt-get -y install python3
#######################################################################
# ROCm does not come with migraphx wrappers for python 3.9, so we build it here
FROM debian-base as debian-build
ARG ROCM
COPY --from=rocm /opt/rocm-$ROCM /opt/rocm-$ROCM
RUN ln -s /opt/rocm-$ROCM /opt/rocm
RUN apt-get -y install g++ cmake
RUN apt-get -y install python3-pybind11 python3.9-distutils python3-dev
WORKDIR /opt/build
COPY docker/rocm/migraphx .
RUN mkdir build && cd build && cmake .. && make install
#######################################################################
FROM deps AS deps-prelim
# need this to install libnuma1
RUN apt-get update
# no ugprade?!?!
RUN apt-get -y install libnuma1
WORKDIR /opt/frigate/
COPY --from=rootfs / /
COPY docker/rocm/rootfs/ /
#######################################################################
FROM scratch AS rocm-dist
ARG ROCM
ARG AMDGPU
COPY --from=rocm /opt/rocm-$ROCM/bin/rocminfo /opt/rocm-$ROCM/bin/migraphx-driver /opt/rocm-$ROCM/bin/
COPY --from=rocm /opt/rocm-$ROCM/share/miopen/db/*$AMDGPU* /opt/rocm-$ROCM/share/miopen/db/
COPY --from=rocm /opt/rocm-$ROCM/lib/rocblas/library/*$AMDGPU* /opt/rocm-$ROCM/lib/rocblas/library/
COPY --from=rocm /opt/rocm-dist/ /
COPY --from=debian-build /opt/rocm/lib/migraphx.cpython-39-x86_64-linux-gnu.so /opt/rocm-$ROCM/lib/
#######################################################################
FROM deps-prelim AS rocm-prelim-hsa-override0
ENV HSA_ENABLE_SDMA=0
COPY --from=rocm-dist / /
RUN ldconfig
#######################################################################
FROM rocm-prelim-hsa-override0 as rocm-prelim-hsa-override1
ARG HSA_OVERRIDE_GFX_VERSION
ENV HSA_OVERRIDE_GFX_VERSION=$HSA_OVERRIDE_GFX_VERSION
#######################################################################
FROM rocm-prelim-hsa-override$HSA_OVERRIDE as rocm-deps
# Request yolov8 download at startup
ENV DOWNLOAD_YOLOV8=1

View File

@ -0,0 +1,26 @@
cmake_minimum_required(VERSION 3.1)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
project(migraphx_py)
include_directories(/opt/rocm/include)
find_package(pybind11 REQUIRED)
pybind11_add_module(migraphx migraphx_py.cpp)
target_link_libraries(migraphx PRIVATE /opt/rocm/lib/libmigraphx.so /opt/rocm/lib/libmigraphx_tf.so /opt/rocm/lib/libmigraphx_onnx.so)
install(TARGETS migraphx
COMPONENT python
LIBRARY DESTINATION /opt/rocm/lib
)

View File

@ -0,0 +1,582 @@
/*
* The MIT License (MIT)
*
* Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <migraphx/program.hpp>
#include <migraphx/instruction_ref.hpp>
#include <migraphx/operation.hpp>
#include <migraphx/quantization.hpp>
#include <migraphx/generate.hpp>
#include <migraphx/instruction.hpp>
#include <migraphx/ref/target.hpp>
#include <migraphx/stringutils.hpp>
#include <migraphx/tf.hpp>
#include <migraphx/onnx.hpp>
#include <migraphx/load_save.hpp>
#include <migraphx/register_target.hpp>
#include <migraphx/json.hpp>
#include <migraphx/make_op.hpp>
#include <migraphx/op/common.hpp>
#ifdef HAVE_GPU
#include <migraphx/gpu/hip.hpp>
#endif
using half = half_float::half;
namespace py = pybind11;
#ifdef __clang__
#define MIGRAPHX_PUSH_UNUSED_WARNING \
_Pragma("clang diagnostic push") \
_Pragma("clang diagnostic ignored \"-Wused-but-marked-unused\"")
#define MIGRAPHX_POP_WARNING _Pragma("clang diagnostic pop")
#else
#define MIGRAPHX_PUSH_UNUSED_WARNING
#define MIGRAPHX_POP_WARNING
#endif
#define MIGRAPHX_PYBIND11_MODULE(...) \
MIGRAPHX_PUSH_UNUSED_WARNING \
PYBIND11_MODULE(__VA_ARGS__) \
MIGRAPHX_POP_WARNING
#define MIGRAPHX_PYTHON_GENERATE_SHAPE_ENUM(x, t) .value(#x, migraphx::shape::type_t::x)
namespace migraphx {
migraphx::value to_value(py::kwargs kwargs);
migraphx::value to_value(py::list lst);
template <class T, class F>
void visit_py(T x, F f)
{
if(py::isinstance<py::kwargs>(x))
{
f(to_value(x.template cast<py::kwargs>()));
}
else if(py::isinstance<py::list>(x))
{
f(to_value(x.template cast<py::list>()));
}
else if(py::isinstance<py::bool_>(x))
{
f(x.template cast<bool>());
}
else if(py::isinstance<py::int_>(x) or py::hasattr(x, "__index__"))
{
f(x.template cast<int>());
}
else if(py::isinstance<py::float_>(x))
{
f(x.template cast<float>());
}
else if(py::isinstance<py::str>(x))
{
f(x.template cast<std::string>());
}
else if(py::isinstance<migraphx::shape::dynamic_dimension>(x))
{
f(migraphx::to_value(x.template cast<migraphx::shape::dynamic_dimension>()));
}
else
{
MIGRAPHX_THROW("VISIT_PY: Unsupported data type!");
}
}
migraphx::value to_value(py::list lst)
{
migraphx::value v = migraphx::value::array{};
for(auto val : lst)
{
visit_py(val, [&](auto py_val) { v.push_back(py_val); });
}
return v;
}
migraphx::value to_value(py::kwargs kwargs)
{
migraphx::value v = migraphx::value::object{};
for(auto arg : kwargs)
{
auto&& key = py::str(arg.first);
auto&& val = arg.second;
visit_py(val, [&](auto py_val) { v[key] = py_val; });
}
return v;
}
} // namespace migraphx
namespace pybind11 {
namespace detail {
template <>
struct npy_format_descriptor<half>
{
static std::string format()
{
// following: https://docs.python.org/3/library/struct.html#format-characters
return "e";
}
static constexpr auto name() { return _("half"); }
};
} // namespace detail
} // namespace pybind11
template <class F>
void visit_type(const migraphx::shape& s, F f)
{
s.visit_type(f);
}
template <class T, class F>
void visit(const migraphx::raw_data<T>& x, F f)
{
x.visit(f);
}
template <class F>
void visit_types(F f)
{
migraphx::shape::visit_types(f);
}
template <class T>
py::buffer_info to_buffer_info(T& x)
{
migraphx::shape s = x.get_shape();
assert(s.type() != migraphx::shape::tuple_type);
if(s.dynamic())
MIGRAPHX_THROW("MIGRAPHX PYTHON: dynamic shape argument passed to to_buffer_info");
auto strides = s.strides();
std::transform(
strides.begin(), strides.end(), strides.begin(), [&](auto i) { return i * s.type_size(); });
py::buffer_info b;
visit_type(s, [&](auto as) {
// migraphx use int8_t data to store bool type, we need to
// explicitly specify the data type as bool for python
if(s.type() == migraphx::shape::bool_type)
{
b = py::buffer_info(x.data(),
as.size(),
py::format_descriptor<bool>::format(),
s.ndim(),
s.lens(),
strides);
}
else
{
b = py::buffer_info(x.data(),
as.size(),
py::format_descriptor<decltype(as())>::format(),
s.ndim(),
s.lens(),
strides);
}
});
return b;
}
migraphx::shape to_shape(const py::buffer_info& info)
{
migraphx::shape::type_t t;
std::size_t n = 0;
visit_types([&](auto as) {
if(info.format == py::format_descriptor<decltype(as())>::format() or
(info.format == "l" and py::format_descriptor<decltype(as())>::format() == "q") or
(info.format == "L" and py::format_descriptor<decltype(as())>::format() == "Q"))
{
t = as.type_enum();
n = sizeof(as());
}
else if(info.format == "?" and py::format_descriptor<decltype(as())>::format() == "b")
{
t = migraphx::shape::bool_type;
n = sizeof(bool);
}
});
if(n == 0)
{
MIGRAPHX_THROW("MIGRAPHX PYTHON: Unsupported data type " + info.format);
}
auto strides = info.strides;
std::transform(strides.begin(), strides.end(), strides.begin(), [&](auto i) -> std::size_t {
return n > 0 ? i / n : 0;
});
// scalar support
if(info.shape.empty())
{
return migraphx::shape{t};
}
else
{
return migraphx::shape{t, info.shape, strides};
}
}
MIGRAPHX_PYBIND11_MODULE(migraphx, m)
{
py::class_<migraphx::shape> shape_cls(m, "shape");
shape_cls
.def(py::init([](py::kwargs kwargs) {
auto v = migraphx::to_value(kwargs);
auto t = migraphx::shape::parse_type(v.get("type", "float"));
if(v.contains("dyn_dims"))
{
auto dyn_dims =
migraphx::from_value<std::vector<migraphx::shape::dynamic_dimension>>(
v.at("dyn_dims"));
return migraphx::shape(t, dyn_dims);
}
auto lens = v.get<std::size_t>("lens", {1});
if(v.contains("strides"))
return migraphx::shape(t, lens, v.at("strides").to_vector<std::size_t>());
else
return migraphx::shape(t, lens);
}))
.def("type", &migraphx::shape::type)
.def("lens", &migraphx::shape::lens)
.def("strides", &migraphx::shape::strides)
.def("ndim", &migraphx::shape::ndim)
.def("elements", &migraphx::shape::elements)
.def("bytes", &migraphx::shape::bytes)
.def("type_string", &migraphx::shape::type_string)
.def("type_size", &migraphx::shape::type_size)
.def("dyn_dims", &migraphx::shape::dyn_dims)
.def("packed", &migraphx::shape::packed)
.def("transposed", &migraphx::shape::transposed)
.def("broadcasted", &migraphx::shape::broadcasted)
.def("standard", &migraphx::shape::standard)
.def("scalar", &migraphx::shape::scalar)
.def("dynamic", &migraphx::shape::dynamic)
.def("__eq__", std::equal_to<migraphx::shape>{})
.def("__ne__", std::not_equal_to<migraphx::shape>{})
.def("__repr__", [](const migraphx::shape& s) { return migraphx::to_string(s); });
py::enum_<migraphx::shape::type_t>(shape_cls, "type_t")
MIGRAPHX_SHAPE_VISIT_TYPES(MIGRAPHX_PYTHON_GENERATE_SHAPE_ENUM);
py::class_<migraphx::shape::dynamic_dimension>(shape_cls, "dynamic_dimension")
.def(py::init<>())
.def(py::init<std::size_t, std::size_t>())
.def(py::init<std::size_t, std::size_t, std::set<std::size_t>>())
.def_readwrite("min", &migraphx::shape::dynamic_dimension::min)
.def_readwrite("max", &migraphx::shape::dynamic_dimension::max)
.def_readwrite("optimals", &migraphx::shape::dynamic_dimension::optimals)
.def("is_fixed", &migraphx::shape::dynamic_dimension::is_fixed);
py::class_<migraphx::argument>(m, "argument", py::buffer_protocol())
.def_buffer([](migraphx::argument& x) -> py::buffer_info { return to_buffer_info(x); })
.def(py::init([](py::buffer b) {
py::buffer_info info = b.request();
return migraphx::argument(to_shape(info), info.ptr);
}))
.def("get_shape", &migraphx::argument::get_shape)
.def("data_ptr",
[](migraphx::argument& x) { return reinterpret_cast<std::uintptr_t>(x.data()); })
.def("tolist",
[](migraphx::argument& x) {
py::list l{x.get_shape().elements()};
visit(x, [&](auto data) { l = py::cast(data.to_vector()); });
return l;
})
.def("__eq__", std::equal_to<migraphx::argument>{})
.def("__ne__", std::not_equal_to<migraphx::argument>{})
.def("__repr__", [](const migraphx::argument& x) { return migraphx::to_string(x); });
py::class_<migraphx::target>(m, "target");
py::class_<migraphx::instruction_ref>(m, "instruction_ref")
.def("shape", [](migraphx::instruction_ref i) { return i->get_shape(); })
.def("op", [](migraphx::instruction_ref i) { return i->get_operator(); });
py::class_<migraphx::module, std::unique_ptr<migraphx::module, py::nodelete>>(m, "module")
.def("print", [](const migraphx::module& mm) { std::cout << mm << std::endl; })
.def(
"add_instruction",
[](migraphx::module& mm,
const migraphx::operation& op,
std::vector<migraphx::instruction_ref>& args,
std::vector<migraphx::module*>& mod_args) {
return mm.add_instruction(op, args, mod_args);
},
py::arg("op"),
py::arg("args"),
py::arg("mod_args") = std::vector<migraphx::module*>{})
.def(
"add_literal",
[](migraphx::module& mm, py::buffer data) {
py::buffer_info info = data.request();
auto literal_shape = to_shape(info);
return mm.add_literal(literal_shape, reinterpret_cast<char*>(info.ptr));
},
py::arg("data"))
.def(
"add_parameter",
[](migraphx::module& mm, const std::string& name, const migraphx::shape shape) {
return mm.add_parameter(name, shape);
},
py::arg("name"),
py::arg("shape"))
.def(
"add_return",
[](migraphx::module& mm, std::vector<migraphx::instruction_ref>& args) {
return mm.add_return(args);
},
py::arg("args"))
.def("__repr__", [](const migraphx::module& mm) { return migraphx::to_string(mm); });
py::class_<migraphx::program>(m, "program")
.def(py::init([]() { return migraphx::program(); }))
.def("get_parameter_names", &migraphx::program::get_parameter_names)
.def("get_parameter_shapes", &migraphx::program::get_parameter_shapes)
.def("get_output_shapes", &migraphx::program::get_output_shapes)
.def("is_compiled", &migraphx::program::is_compiled)
.def(
"compile",
[](migraphx::program& p,
const migraphx::target& t,
bool offload_copy,
bool fast_math,
bool exhaustive_tune) {
migraphx::compile_options options;
options.offload_copy = offload_copy;
options.fast_math = fast_math;
options.exhaustive_tune = exhaustive_tune;
p.compile(t, options);
},
py::arg("t"),
py::arg("offload_copy") = true,
py::arg("fast_math") = true,
py::arg("exhaustive_tune") = false)
.def("get_main_module", [](const migraphx::program& p) { return p.get_main_module(); })
.def(
"create_module",
[](migraphx::program& p, const std::string& name) { return p.create_module(name); },
py::arg("name"))
.def("run",
[](migraphx::program& p, py::dict params) {
migraphx::parameter_map pm;
for(auto x : params)
{
std::string key = x.first.cast<std::string>();
py::buffer b = x.second.cast<py::buffer>();
py::buffer_info info = b.request();
pm[key] = migraphx::argument(to_shape(info), info.ptr);
}
return p.eval(pm);
})
.def("run_async",
[](migraphx::program& p,
py::dict params,
std::uintptr_t stream,
std::string stream_name) {
migraphx::parameter_map pm;
for(auto x : params)
{
std::string key = x.first.cast<std::string>();
py::buffer b = x.second.cast<py::buffer>();
py::buffer_info info = b.request();
pm[key] = migraphx::argument(to_shape(info), info.ptr);
}
migraphx::execution_environment exec_env{
migraphx::any_ptr(reinterpret_cast<void*>(stream), stream_name), true};
return p.eval(pm, exec_env);
})
.def("sort", &migraphx::program::sort)
.def("print", [](const migraphx::program& p) { std::cout << p << std::endl; })
.def("__eq__", std::equal_to<migraphx::program>{})
.def("__ne__", std::not_equal_to<migraphx::program>{})
.def("__repr__", [](const migraphx::program& p) { return migraphx::to_string(p); });
py::class_<migraphx::operation> op(m, "op");
op.def(py::init([](const std::string& name, py::kwargs kwargs) {
migraphx::value v = migraphx::value::object{};
if(kwargs)
{
v = migraphx::to_value(kwargs);
}
return migraphx::make_op(name, v);
}))
.def("name", &migraphx::operation::name);
py::enum_<migraphx::op::pooling_mode>(op, "pooling_mode")
.value("average", migraphx::op::pooling_mode::average)
.value("max", migraphx::op::pooling_mode::max)
.value("lpnorm", migraphx::op::pooling_mode::lpnorm);
py::enum_<migraphx::op::rnn_direction>(op, "rnn_direction")
.value("forward", migraphx::op::rnn_direction::forward)
.value("reverse", migraphx::op::rnn_direction::reverse)
.value("bidirectional", migraphx::op::rnn_direction::bidirectional);
m.def(
"argument_from_pointer",
[](const migraphx::shape shape, const int64_t address) {
return migraphx::argument(shape, reinterpret_cast<void*>(address));
},
py::arg("shape"),
py::arg("address"));
m.def(
"parse_tf",
[](const std::string& filename,
bool is_nhwc,
unsigned int batch_size,
std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
std::vector<std::string> output_names) {
return migraphx::parse_tf(
filename, migraphx::tf_options{is_nhwc, batch_size, map_input_dims, output_names});
},
"Parse tf protobuf (default format is nhwc)",
py::arg("filename"),
py::arg("is_nhwc") = true,
py::arg("batch_size") = 1,
py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
py::arg("output_names") = std::vector<std::string>());
m.def(
"parse_onnx",
[](const std::string& filename,
unsigned int default_dim_value,
migraphx::shape::dynamic_dimension default_dyn_dim_value,
std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>
map_dyn_input_dims,
bool skip_unknown_operators,
bool print_program_on_error,
int64_t max_loop_iterations) {
migraphx::onnx_options options;
options.default_dim_value = default_dim_value;
options.default_dyn_dim_value = default_dyn_dim_value;
options.map_input_dims = map_input_dims;
options.map_dyn_input_dims = map_dyn_input_dims;
options.skip_unknown_operators = skip_unknown_operators;
options.print_program_on_error = print_program_on_error;
options.max_loop_iterations = max_loop_iterations;
return migraphx::parse_onnx(filename, options);
},
"Parse onnx file",
py::arg("filename"),
py::arg("default_dim_value") = 0,
py::arg("default_dyn_dim_value") = migraphx::shape::dynamic_dimension{1, 1},
py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
py::arg("map_dyn_input_dims") =
std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>(),
py::arg("skip_unknown_operators") = false,
py::arg("print_program_on_error") = false,
py::arg("max_loop_iterations") = 10);
m.def(
"parse_onnx_buffer",
[](const std::string& onnx_buffer,
unsigned int default_dim_value,
migraphx::shape::dynamic_dimension default_dyn_dim_value,
std::unordered_map<std::string, std::vector<std::size_t>> map_input_dims,
std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>
map_dyn_input_dims,
bool skip_unknown_operators,
bool print_program_on_error) {
migraphx::onnx_options options;
options.default_dim_value = default_dim_value;
options.default_dyn_dim_value = default_dyn_dim_value;
options.map_input_dims = map_input_dims;
options.map_dyn_input_dims = map_dyn_input_dims;
options.skip_unknown_operators = skip_unknown_operators;
options.print_program_on_error = print_program_on_error;
return migraphx::parse_onnx_buffer(onnx_buffer, options);
},
"Parse onnx file",
py::arg("filename"),
py::arg("default_dim_value") = 0,
py::arg("default_dyn_dim_value") = migraphx::shape::dynamic_dimension{1, 1},
py::arg("map_input_dims") = std::unordered_map<std::string, std::vector<std::size_t>>(),
py::arg("map_dyn_input_dims") =
std::unordered_map<std::string, std::vector<migraphx::shape::dynamic_dimension>>(),
py::arg("skip_unknown_operators") = false,
py::arg("print_program_on_error") = false);
m.def(
"load",
[](const std::string& name, const std::string& format) {
migraphx::file_options options;
options.format = format;
return migraphx::load(name, options);
},
"Load MIGraphX program",
py::arg("filename"),
py::arg("format") = "msgpack");
m.def(
"save",
[](const migraphx::program& p, const std::string& name, const std::string& format) {
migraphx::file_options options;
options.format = format;
return migraphx::save(p, name, options);
},
"Save MIGraphX program",
py::arg("p"),
py::arg("filename"),
py::arg("format") = "msgpack");
m.def("get_target", &migraphx::make_target);
m.def("create_argument", [](const migraphx::shape& s, const std::vector<double>& values) {
if(values.size() != s.elements())
MIGRAPHX_THROW("Values and shape elements do not match");
migraphx::argument a{s};
a.fill(values.begin(), values.end());
return a;
});
m.def("generate_argument", &migraphx::generate_argument, py::arg("s"), py::arg("seed") = 0);
m.def("fill_argument", &migraphx::fill_argument, py::arg("s"), py::arg("value"));
m.def("quantize_fp16",
&migraphx::quantize_fp16,
py::arg("prog"),
py::arg("ins_names") = std::vector<std::string>{"all"});
m.def("quantize_int8",
&migraphx::quantize_int8,
py::arg("prog"),
py::arg("t"),
py::arg("calibration") = std::vector<migraphx::parameter_map>{},
py::arg("ins_names") = std::vector<std::string>{"dot", "convolution"});
#ifdef HAVE_GPU
m.def("allocate_gpu", &migraphx::gpu::allocate_gpu, py::arg("s"), py::arg("host") = false);
m.def("to_gpu", &migraphx::gpu::to_gpu, py::arg("arg"), py::arg("host") = false);
m.def("from_gpu", &migraphx::gpu::from_gpu);
m.def("gpu_sync", [] { migraphx::gpu::gpu_sync(); });
#endif
#ifdef VERSION_INFO
m.attr("__version__") = VERSION_INFO;
#else
m.attr("__version__") = "dev";
#endif
}

3
docker/rocm/rocm-pin-600 Normal file
View File

@ -0,0 +1,3 @@
Package: *
Pin: release o=repo.radeon.com
Pin-Priority: 600

38
docker/rocm/rocm.hcl Normal file
View File

@ -0,0 +1,38 @@
variable "AMDGPU" {
default = "gfx900"
}
variable "ROCM" {
default = "5.7.3"
}
variable "HSA_OVERRIDE_GFX_VERSION" {
default = ""
}
variable "HSA_OVERRIDE" {
default = "1"
}
target deps {
dockerfile = "docker/main/Dockerfile"
platforms = ["linux/amd64"]
target = "deps"
}
target rootfs {
dockerfile = "docker/main/Dockerfile"
platforms = ["linux/amd64"]
target = "rootfs"
}
target rocm {
dockerfile = "docker/rocm/Dockerfile"
contexts = {
deps = "target:deps",
rootfs = "target:rootfs"
}
platforms = ["linux/amd64"]
args = {
AMDGPU = AMDGPU,
ROCM = ROCM,
HSA_OVERRIDE_GFX_VERSION = HSA_OVERRIDE_GFX_VERSION,
HSA_OVERRIDE = HSA_OVERRIDE
}
}

1
docker/rocm/rocm.list Normal file
View File

@ -0,0 +1 @@
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7.3 focal main

17
docker/rocm/rocm.mk Normal file
View File

@ -0,0 +1,17 @@
BOARDS += rocm
# AMD/ROCm is chunky so we build couple of smaller images for specific chipsets
ROCM_CHIPSETS:=gfx900:9.0.0 gfx1030:10.3.0 gfx1100:11.0.0
local-rocm: version
$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --load --file=docker/rocm/rocm.hcl --set rocm.tags=frigate:latest-rocm-$(word 1,$(subst :, ,$(chipset))) rocm;)
unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --load --file=docker/rocm/rocm.hcl --set rocm.tags=frigate:latest-rocm rocm
build-rocm: version
$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm-$(chipset) rocm;)
unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm rocm
push-rocm: build-rocm
$(foreach chipset,$(ROCM_CHIPSETS),AMDGPU=$(word 1,$(subst :, ,$(chipset))) HSA_OVERRIDE_GFX_VERSION=$(word 2,$(subst :, ,$(chipset))) HSA_OVERRIDE=1 docker buildx bake --push --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm-$(chipset) rocm;)
unset HSA_OVERRIDE_GFX_VERSION && HSA_OVERRIDE=0 AMDGPU=gfx docker buildx bake --push --file=docker/rocm/rocm.hcl --set rocm.tags=$(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-rocm rocm

View File

@ -0,0 +1,20 @@
#!/command/with-contenv bash
# shellcheck shell=bash
# Compile YoloV8 ONNX files into ROCm MIGraphX files
OVERRIDE=$(cd /opt/frigate && python3 -c 'import frigate.detectors.plugins.rocm as rocm; print(rocm.auto_override_gfx_version())')
if ! test -z "$OVERRIDE"; then
echo "Using HSA_OVERRIDE_GFX_VERSION=${OVERRIDE}"
export HSA_OVERRIDE_GFX_VERSION=$OVERRIDE
fi
for onnx in /config/model_cache/yolov8/*.onnx
do
mxr="${onnx%.onnx}.mxr"
if ! test -f $mxr; then
echo "processing $onnx into $mxr"
/opt/rocm/bin/migraphx-driver compile $onnx --optimize --gpu --enable-offload-copy --binary -o $mxr
fi
done

View File

@ -0,0 +1 @@
oneshot

View File

@ -0,0 +1 @@
/etc/s6-overlay/s6-rc.d/compile-rocm-models/run

View File

@ -397,3 +397,158 @@ detectors:
```
:::
## AMD/ROCm GPU detector
### Setup
The `rocm` detector supports running [ultralytics](https://github.com/ultralytics/ultralytics) yolov8 models on AMD GPUs and iGPUs. Use a frigate docker image with `-rocm` suffix, for example `ghcr.io/blakeblackshear/frigate:stable-rocm`.
As the ROCm software stack is quite bloated, there are also smaller versions for specific GPU chipsets:
- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx900`
- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1030`
- `ghcr.io/blakeblackshear/frigate:stable-rocm-gfx1100`
### Docker settings for GPU access
ROCm needs access to the `/dev/kfd` and `/dev/dri` devices. When docker or frigate is not run under root then also `video` (and possibly `render` and `ssl/_ssl`) groups should be added.
When running docker directly the following flags should be added for device access:
```bash
$ docker run --device=/dev/kfd --device=/dev/dri \
...
```
When using docker compose:
```yaml
services:
frigate:
...
devices:
- /dev/dri
- /dev/kfd
...
```
For reference on recommended settings see [running ROCm/pytorch in Docker](https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html#using-docker-with-pytorch-pre-installed).
### Docker settings for overriding the GPU chipset
Your GPU or iGPU might work just fine without any special configuration but in many cases they need manual settings. AMD/ROCm software stack comes with a limited set of GPU drivers and for newer or missing models you will have to override the chipset version to an older/generic version to get things working.
Also AMD/ROCm does not "officially" support integrated GPUs. It still does work with most of them just fine but requires special settings. One has to configure the `HSA_OVERRIDE_GFX_VERSION` environment variable. See the [ROCm bug report](https://github.com/ROCm/ROCm/issues/1743) for context and examples.
For chipset specific frigate rocm builds this variable is already set automatically.
For the general rocm frigate build there is some automatic detection:
- gfx90c -> 9.0.0
- gfx1031 -> 10.3.0
- gfx1103 -> 11.0.0
If you have something else you might need to override the `HSA_OVERRIDE_GFX_VERSION` at Docker launch. Suppose the version you want is `9.0.0`, then you should configure it from command line as:
```bash
$ docker run -e HSA_OVERRIDE_GFX_VERSION=9.0.0 \
...
```
When using docker compose:
```yaml
services:
frigate:
...
environment:
HSA_OVERRIDE_GFX_VERSION: "9.0.0"
```
Figuring out what version you need can be complicated as you can't tell the chipset name and driver from the AMD brand name.
- first make sure that rocm environment is running properly by running `/opt/rocm/bin/rocminfo` in the frigate container -- it should list both the CPU and the GPU with their properties
- find the chipset version you have (gfxNNN) from the output of the `rocminfo` (see below)
- use a search engine to query what `HSA_OVERRIDE_GFX_VERSION` you need for the given gfx name ("gfxNNN ROCm HSA_OVERRIDE_GFX_VERSION")
- override the `HSA_OVERRIDE_GFX_VERSION` with relevant value
- if things are not working check the frigate docker logs
#### Figuring out if AMD/ROCm is working and found your GPU
```bash
$ docker exec -it frigate /opt/rocm/bin/rocminfo
```
#### Figuring out your AMD GPU chipset version:
We unset the `HSA_OVERRIDE_GFX_VERSION` to prevent an existing override from messing up the result:
```bash
$ docker exec -it frigate /bin/bash -c '(unset HSA_OVERRIDE_GFX_VERSION && /opt/rocm/bin/rocminfo |grep gfx)'
```
### Yolov8 model download and available files
The ROCm specific frigate docker containers automatically download yolov8 files from https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ at startup --
they fetch [yolov8.small.models.tar.gz](https://github.com/harakas/models/releases/download/yolov8.1-1.1/yolov8.small.models.tar.gz)
and uncompresses it into the `/config/model_cache/yolov8/` directory. After that the model files are compiled for your GPU chipset.
Both the download and compilation can take couple of minutes during which frigate will not be responsive. See docker logs for how it is progressing.
Automatic model download can be configured with the `DOWNLOAD_YOLOV8=1/0` environment variable either from the command line
```bash
$ docker run ... -e DOWNLOAD_YOLOV8=1 \
...
```
or when using docker compose:
```yaml
services:
frigate:
...
environment:
DOWNLOAD_YOLOV8: "1"
```
Download can be triggered also in regular frigate builds using that environment variable. The following files will be available under `/config/model_cache/yolov8/`:
- `yolov8[ns]_320x320.onnx` -- nano (n) and small (s) sized floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the coco dataset (90 classes)
- `yolov8[ns]-oiv7_320x320.onnx` -- floating point model files usable by the `rocm` and `onnx` detectors that have been trained using the google open images v7 dataset (601 classes)
- `labels.txt` and `labels-frigate.txt` -- full and aggregated labels for the coco dataset models
- `labels-oiv7.txt` and `labels-oiv7-frigate.txt` -- labels for the oiv7 dataset models
The aggregated label files contain renamed labels leaving only `person`, `vehicle`, `animal` and `bird` classes. The oiv7 trained models contain 601 classes and so are difficult to configure manually -- using aggregate labels is recommended.
Larger models (of `m` and `l` size and also at `640x640` resolution) can be found at https://github.com/harakas/models/releases/tag/yolov8.1-1.1/ but have to be installed manually.
The oiv7 models have been trained using a larger google open images v7 dataset. They also contain a lot more detection classes (over 600) so using aggregate label files is recommended. The large number of classes leads to lower baseline for detection probability values and also for higher resource consumption (they are slower to evaluate).
The `rocm` builds precompile the `onnx` files for your chipset into `mxr` files. If you change your hardware or GPU or have compiled the wrong versions you need to delete the cached `.mxr` files under `/config/model_cache/yolov8/`.
### Frigate configuration
You also need to modify the frigate configuration to specify the detector, labels and model file. Here is an example configuration running `yolov8s`:
```yaml
model:
labelmap_path: /config/model_cache/yolov8/labels.txt
model_type: yolov8
detectors:
rocm:
type: rocm
model:
path: /config/model_cache/yolov8/yolov8s_320x320.onnx
```
Other settings available for the rocm detector
- `conserve_cpu: True` -- run ROCm/HIP synchronization in blocking mode saving CPU (at small loss of latency and maximum throughput)
- `auto_override_gfx: True` -- enable or disable automatic gfx driver detection
### Expected performance
On an AMD Ryzen 3 5400U with integrated GPU (gfx90c) the yolov8n runs in around 9ms per image (about 110 detections per second) and 18ms (55 detections per second) for yolov8s (at 320x320 detector resolution).

View File

@ -105,6 +105,12 @@ Frigate supports SBCs with the following Rockchip SoCs:
Using the yolov8n model and an Orange Pi 5 Plus with RK3588 SoC inference speeds vary between 20 - 25 ms.
#### AMD GPUs and iGPUs
With the [rocm](../configuration/object_detectors.md#amdrocm-gpu-detector) detector Frigate can take advantage of many AMD GPUs and iGPUs.
An AMD Ryzen mini PC with AMD Ryzen 3 5400U iGPU takes about 9 ms to evaluate yolov8n.
## What does Frigate use the CPU for and what does it use a detector for? (ELI5 Version)
This is taken from a [user question on reddit](https://www.reddit.com/r/homeassistant/comments/q8mgau/comment/hgqbxh5/?utm_source=share&utm_medium=web2x&context=3). Modified slightly for clarity.

View File

@ -150,6 +150,10 @@ The community supported docker image tags for the current stable version are:
- `stable-tensorrt-jp5` - Frigate build optimized for nvidia Jetson devices running Jetpack 5
- `stable-tensorrt-jp4` - Frigate build optimized for nvidia Jetson devices running Jetpack 4.6
- `stable-rk` - Frigate build for SBCs with Rockchip SoC
- `stable-rocm` - Frigate build for [AMD GPUs and iGPUs](../configuration/object_detectors.md#amdrocm-gpu-detector), all drivers
- `stable-rocm-gfx900` - AMD gfx900 driver only
- `stable-rocm-gfx1030` - AMD gfx1030 driver only
- `stable-rocm-gfx1100` - AMD gfx1100 driver only
## Home Assistant Addon

View File

@ -0,0 +1,65 @@
import glob
import logging
import numpy as np
from typing_extensions import Literal
from frigate.detectors.detection_api import DetectionApi
from frigate.detectors.detector_config import BaseDetectorConfig
from frigate.detectors.util import preprocess, yolov8_postprocess
logger = logging.getLogger(__name__)
DETECTOR_KEY = "onnx"
class ONNXDetectorConfig(BaseDetectorConfig):
type: Literal[DETECTOR_KEY]
class ONNXDetector(DetectionApi):
type_key = DETECTOR_KEY
def __init__(self, detector_config: ONNXDetectorConfig):
try:
import onnxruntime
logger.info("ONNX: loaded onnxruntime module")
except ModuleNotFoundError:
logger.error(
"ONNX: module loading failed, need 'pip install onnxruntime'?!?"
)
raise
assert (
detector_config.model.model_type == "yolov8"
), "ONNX: detector_config.model.model_type: only yolov8 supported"
assert (
detector_config.model.input_tensor == "nhwc"
), "ONNX: detector_config.model.input_tensor: only nhwc supported"
if detector_config.model.input_pixel_format != "rgb":
logger.warn(
"ONNX: detector_config.model.input_pixel_format: should be 'rgb' for yolov8, but '{detector_config.model.input_pixel_format}' specified!"
)
assert detector_config.model.path is not None, (
"ONNX: No model.path configured, please configure model.path and model.labelmap_path; some suggestions: "
+ ", ".join(glob.glob("/config/model_cache/yolov8/*.onnx"))
+ " and "
+ ", ".join(glob.glob("/config/model_cache/yolov8/*_labels.txt"))
)
path = detector_config.model.path
logger.info(f"ONNX: loading {detector_config.model.path}")
self.model = onnxruntime.InferenceSession(path)
logger.info(f"ONNX: {path} loaded")
def detect_raw(self, tensor_input):
model_input_name = self.model.get_inputs()[0].name
model_input_shape = self.model.get_inputs()[0].shape
tensor_input = preprocess(tensor_input, model_input_shape, np.float32)
tensor_output = self.model.run(None, {model_input_name: tensor_input})[0]
return yolov8_postprocess(model_input_shape, tensor_output)

View File

@ -0,0 +1,143 @@
import ctypes
import glob
import logging
import os
import subprocess
import sys
import numpy as np
from pydantic import Field
from typing_extensions import Literal
from frigate.detectors.detection_api import DetectionApi
from frigate.detectors.detector_config import BaseDetectorConfig
from frigate.detectors.util import preprocess, yolov8_postprocess
logger = logging.getLogger(__name__)
DETECTOR_KEY = "rocm"
def detect_gfx_version():
return subprocess.getoutput(
"unset HSA_OVERRIDE_GFX_VERSION && /opt/rocm/bin/rocminfo | grep gfx |head -1|awk '{print $2}'"
)
def auto_override_gfx_version():
# If environment varialbe already in place, do not override
gfx_version = detect_gfx_version()
old_override = os.getenv("HSA_OVERRIDE_GFX_VERSION")
if old_override not in (None, ""):
logger.warning(
f"AMD/ROCm: detected {gfx_version} but HSA_OVERRIDE_GFX_VERSION already present ({old_override}), not overriding!"
)
return old_override
mapping = {
"gfx90c": "9.0.0",
"gfx1031": "10.3.0",
"gfx1103": "11.0.0",
}
override = mapping.get(gfx_version)
if override is not None:
logger.warning(
f"AMD/ROCm: detected {gfx_version}, overriding HSA_OVERRIDE_GFX_VERSION={override}"
)
os.putenv("HSA_OVERRIDE_GFX_VERSION", override)
return override
return ""
class ROCmDetectorConfig(BaseDetectorConfig):
type: Literal[DETECTOR_KEY]
conserve_cpu: bool = Field(
default=True,
title="Conserve CPU at the expense of latency (and reduced max throughput)",
)
auto_override_gfx: bool = Field(
default=True, title="Automatically detect and override gfx version"
)
class ROCmDetector(DetectionApi):
type_key = DETECTOR_KEY
def __init__(self, detector_config: ROCmDetectorConfig):
if detector_config.auto_override_gfx:
auto_override_gfx_version()
try:
sys.path.append("/opt/rocm/lib")
import migraphx
logger.info("AMD/ROCm: loaded migraphx module")
except ModuleNotFoundError:
logger.error("AMD/ROCm: module loading failed, missing ROCm environment?")
raise
if detector_config.conserve_cpu:
logger.info("AMD/ROCm: switching HIP to blocking mode to conserve CPU")
ctypes.CDLL("/opt/rocm/lib/libamdhip64.so").hipSetDeviceFlags(4)
assert (
detector_config.model.model_type == "yolov8"
), "AMD/ROCm: detector_config.model.model_type: only yolov8 supported"
assert (
detector_config.model.input_tensor == "nhwc"
), "AMD/ROCm: detector_config.model.input_tensor: only nhwc supported"
if detector_config.model.input_pixel_format != "rgb":
logger.warn(
"AMD/ROCm: detector_config.model.input_pixel_format: should be 'rgb' for yolov8, but '{detector_config.model.input_pixel_format}' specified!"
)
assert detector_config.model.path is not None, (
"No model.path configured, please configure model.path and model.labelmap_path; some suggestions: "
+ ", ".join(glob.glob("/config/model_cache/yolov8/*.onnx"))
+ " and "
+ ", ".join(glob.glob("/config/model_cache/yolov8/*_labels.txt"))
)
path = detector_config.model.path
mxr_path = os.path.splitext(path)[0] + ".mxr"
if path.endswith(".mxr"):
logger.info(f"AMD/ROCm: loading parsed model from {mxr_path}")
self.model = migraphx.load(mxr_path)
elif os.path.exists(mxr_path):
logger.info(f"AMD/ROCm: loading parsed model from {mxr_path}")
self.model = migraphx.load(mxr_path)
else:
logger.info(f"AMD/ROCm: loading model from {path}")
if path.endswith(".onnx"):
self.model = migraphx.parse_onnx(path)
elif (
path.endswith(".tf")
or path.endswith(".tf2")
or path.endswith(".tflite")
):
# untested
self.model = migraphx.parse_tf(path)
else:
raise Exception(f"AMD/ROCm: unkown model format {path}")
logger.info("AMD/ROCm: compiling the model")
self.model.compile(
migraphx.get_target("gpu"), offload_copy=True, fast_math=True
)
logger.info(f"AMD/ROCm: saving parsed model into {mxr_path}")
os.makedirs("/config/model_cache/rocm", exist_ok=True)
migraphx.save(self.model, mxr_path)
logger.info("AMD/ROCm: model loaded")
def detect_raw(self, tensor_input):
model_input_name = self.model.get_parameter_names()[0]
model_input_shape = tuple(
self.model.get_parameter_shapes()[model_input_name].lens()
)
tensor_input = preprocess(tensor_input, model_input_shape, np.float32)
detector_result = self.model.run({model_input_name: tensor_input})[0]
addr = ctypes.cast(detector_result.data_ptr(), ctypes.POINTER(ctypes.c_float))
tensor_output = np.ctypeslib.as_array(
addr, shape=detector_result.get_shape().lens()
)
return yolov8_postprocess(model_input_shape, tensor_output)

83
frigate/detectors/util.py Normal file
View File

@ -0,0 +1,83 @@
import logging
import cv2
import numpy as np
logger = logging.getLogger(__name__)
def preprocess(tensor_input, model_input_shape, model_input_element_type):
model_input_shape = tuple(model_input_shape)
assert tensor_input.dtype == np.uint8, f"tensor_input.dtype: {tensor_input.dtype}"
if len(tensor_input.shape) == 3:
tensor_input = tensor_input[np.newaxis, :]
if model_input_element_type == np.uint8:
# nothing to do for uint8 model input
assert (
model_input_shape == tensor_input.shape
), f"model_input_shape: {model_input_shape}, tensor_input.shape: {tensor_input.shape}"
return tensor_input
assert (
model_input_element_type == np.float32
), f"model_input_element_type: {model_input_element_type}"
# tensor_input must be nhwc
assert tensor_input.shape[3] == 3, f"tensor_input.shape: {tensor_input.shape}"
if tensor_input.shape[1:3] != model_input_shape[2:4]:
logger.warn(
f"preprocess: tensor_input.shape {tensor_input.shape} and model_input_shape {model_input_shape} do not match!"
)
# cv2.dnn.blobFromImage is faster than numpying it
return cv2.dnn.blobFromImage(
tensor_input[0],
1.0 / 255,
(model_input_shape[3], model_input_shape[2]),
None,
swapRB=False,
)
def yolov8_postprocess(
model_input_shape,
tensor_output,
box_count=20,
score_threshold=0.5,
nms_threshold=0.5,
):
model_box_count = tensor_output.shape[2]
probs = tensor_output[0, 4:, :]
all_ids = np.argmax(probs, axis=0)
all_confidences = probs.T[np.arange(model_box_count), all_ids]
all_boxes = tensor_output[0, 0:4, :].T
mask = all_confidences > score_threshold
class_ids = all_ids[mask]
confidences = all_confidences[mask]
cx, cy, w, h = all_boxes[mask].T
if model_input_shape[3] == 3:
scale_y, scale_x = 1 / model_input_shape[1], 1 / model_input_shape[2]
else:
scale_y, scale_x = 1 / model_input_shape[2], 1 / model_input_shape[3]
detections = np.stack(
(
class_ids,
confidences,
scale_y * (cy - h / 2),
scale_x * (cx - w / 2),
scale_y * (cy + h / 2),
scale_x * (cx + w / 2),
),
axis=1,
)
if detections.shape[0] > box_count:
# if too many detections, do nms filtering to suppress overlapping boxes
boxes = np.stack((cx - w / 2, cy - h / 2, w, h), axis=1)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
detections = detections[indexes]
# if still too many, trim the rest by confidence
if detections.shape[0] > box_count:
detections = detections[
np.argpartition(detections[:, 1], -box_count)[-box_count:]
]
detections = detections.copy()
detections.resize((box_count, 6))
return detections