Miscellaneous Fixes (#21102)

* ensure audio events display timeline entries in tracking details

* tweak tracking details layout for small desktop sizes

* update transcription docs

* Update classification docs for training recommendations

* Make number of classification images to be kept configurable

* Add bird to classification reference

* Fix incorrect averaging of the segments so it correctly only uses the most recent segments

* fix trigger logic

* add ability to download clean snapshot

---------

Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
This commit is contained in:
Josh Hawkins
2025-12-02 08:21:15 -06:00
committed by GitHub
parent 9d4aac2b8e
commit 1f9669bbe5
15 changed files with 240 additions and 126 deletions

View File

@@ -168,6 +168,8 @@ Recorded `speech` events will always use a `whisper` model, regardless of the `m
If you hear speech thats actually important and worth saving/indexing for the future, **just press the transcribe button in Explore** on that specific `speech` event - that keeps things explicit, reliable, and under your control.
Other options are being considered for future versions of Frigate to add transcription options that support external `whisper` Docker containers. A single transcription service could then be shared by Frigate and other applications (for example, Home Assistant Voice), and run on more powerful machines when available.
2. Why don't you save live transcription text and use that for `speech` events?
Theres no guarantee that a `speech` event is even created from the exact audio that went through the transcription model. Live transcription and `speech` event creation are **separate, asynchronous processes**. Even when both are correctly configured, trying to align the **precise start and end time of a speech event** with whatever audio the model happened to be processing at that moment is unreliable.

View File

@@ -69,4 +69,6 @@ Once all images are assigned, training will begin automatically.
### Improving the Model
- **Problem framing**: Keep classes visually distinct and state-focused (e.g., `open`, `closed`, `unknown`). Avoid combining object identity with state in a single model unless necessary.
- **Data collection**: Use the models Recent Classifications tab to gather balanced examples across times of day and weather.
- **Data collection**: Use the model's Recent Classifications tab to gather balanced examples across times of day and weather.
- **When to train**: Focus on cases where the model is entirely incorrect or flips between states when it should not. There's no need to train additional images when the model is already working consistently.
- **Selecting training images**: Images scoring below 100% due to new conditions (e.g., first snow of the year, seasonal changes) or variations (e.g., objects temporarily in view, insects at night) are good candidates for training, as they represent scenarios different from the default state. Training these lower-scoring images that differ from existing training data helps prevent overfitting. Avoid training large quantities of images that look very similar, especially if they already score 100% as this can lead to overfitting.

View File

@@ -710,6 +710,44 @@ audio_transcription:
# List of language codes: https://github.com/openai/whisper/blob/main/whisper/tokenizer.py#L10
language: en
# Optional: Configuration for classification models
classification:
# Optional: Configuration for bird classification
bird:
# Optional: Enable bird classification (default: shown below)
enabled: False
# Optional: Minimum classification score required to be considered a match (default: shown below)
threshold: 0.9
custom:
# Required: name of the classification model
model_name:
# Optional: Enable running the model (default: shown below)
enabled: True
# Optional: Name of classification model (default: shown below)
name: None
# Optional: Classification score threshold to change the state (default: shown below)
threshold: 0.8
# Optional: Number of classification attempts to save in the recent classifications tab (default: shown below)
# NOTE: Defaults to 200 for object classification and 100 for state classification if not specified
save_attempts: None
# Optional: Object classification configuration
object_config:
# Required: Object types to classify
objects: [dog]
# Optional: Type of classification that is applied (default: shown below)
classification_type: sub_label
# Optional: State classification configuration
state_config:
# Required: Cameras to run classification on
cameras:
camera_name:
# Required: Crop of image frame on this camera to run classification on
crop: [0, 180, 220, 400]
# Optional: If classification should be run when motion is detected in the crop (default: shown below)
motion: False
# Optional: Interval to run classification on in seconds (default: shown below)
interval: None
# Optional: Restream configuration
# Uses https://github.com/AlexxIT/go2rtc (v1.9.10)
# NOTE: The default go2rtc API port (1984) must be used,