Update triggers docs to explain why text-to-image triggers are unsupported

Many users won't understand why CLIP models can't be magic object detectors or classifiers
2025-09-23 17:52:05 +02:00 · 2025-09-19 20:09:05 -05:00 · 2025-09-19 20:09:05 -05:00 · 80d1c16783
commit 80d1c16783
parent 2a860bd85e
1 changed files with 11 additions and 3 deletions
--- a/docs/docs/configuration/semantic_search.md
+++ b/docs/docs/configuration/semantic_search.md
@ -85,13 +85,13 @@ semantic_search:
  enabled: True
  model_size: large
  # Optional, if using the 'large' model in a multi-GPU installation
-  device: 0 
+  device: 0
 ```
 :::info
-If the correct build is used for your GPU / NPU and the `large` model is configured, then the GPU / NPU will be detected and used automatically. 
+If the correct build is used for your GPU / NPU and the `large` model is configured, then the GPU / NPU will be detected and used automatically.
-Specify the `device` option to target a specific GPU in a multi-GPU system (see [onnxruntime's provider options](https://onnxruntime.ai/docs/execution-providers/)). 
+Specify the `device` option to target a specific GPU in a multi-GPU system (see [onnxruntime's provider options](https://onnxruntime.ai/docs/execution-providers/)).
 If you do not specify a device, the first available GPU will be used.
 See the [Hardware Accelerated Enrichments](/configuration/hardware_acceleration_enrichments.md) documentation.
@ -144,3 +144,11 @@ When a trigger fires, the UI highlights the trigger with a blue outline for 3 se
 - Triggers rely on the same Jina AI CLIP models (V1 or V2) used for semantic search. Ensure `semantic_search` is enabled and properly configured.
 - Reindexing embeddings (via the UI or `reindex: True`) does not affect trigger configurations but may update the embeddings used for matching.
 - For optimal performance, use a system with sufficient RAM (8GB minimum, 16GB recommended) and a GPU for `large` model configurations, as described in the Semantic Search requirements.
 ### FAQ
 #### Why can't I create a trigger on thumbnails for some text, like "person with a blue shirt" and have it trigger when a person with a blue shirt is detected?
 TL;DR: Text-to-image triggers aren’t supported because CLIP can confuse similar images and give inconsistent scores, making automation unreliable.
 Text-to-image triggers are not supported due to fundamental limitations of CLIP-based similarity search. While CLIP works well for exploratory, manual queries, it is unreliable for automated triggers based on a threshold. Issues include embedding drift (the same text–image pair can yield different cosine distances over time), lack of true semantic grounding (visually similar but incorrect matches), and unstable thresholding (distance distributions are dataset-dependent and often too tightly clustered to separate relevant from irrelevant results). Instead, it is recommended to set up a workflow with thumbnail triggers: first use text search to manually select 3–5 representative reference tracked objects, then configure thumbnail triggers based on that visual similarity. This provides robust automation without the semantic ambiguity of text to image matching.