blakeblackshear.frigate/frigate/events/cleanup.py

368 lines
14 KiB
Python
Raw Normal View History

"""Cleanup events based on configured retention."""
import datetime
import logging
import os
import threading
from multiprocessing.synchronize import Event as MpEvent
from pathlib import Path
from frigate.config import FrigateConfig
from frigate.const import CLIPS_DIR
from frigate.db.sqlitevecq import SqliteVecQueueDatabase
from frigate.models import Event, Timeline
logger = logging.getLogger(__name__)
CHUNK_SIZE = 50
class EventCleanup(threading.Thread):
Use sqlite-vec extension instead of chromadb for embeddings (#14163) * swap sqlite_vec for chroma in requirements * load sqlite_vec in embeddings manager * remove chroma and revamp Embeddings class for sqlite_vec * manual minilm onnx inference * remove chroma in clip model * migrate api from chroma to sqlite_vec * migrate event cleanup from chroma to sqlite_vec * migrate embedding maintainer from chroma to sqlite_vec * genai description for sqlite_vec * load sqlite_vec in main thread db * extend the SqliteQueueDatabase class and use peewee db.execute_sql * search with Event type for similarity * fix similarity search * install and add comment about transformers * fix normalization * add id filter * clean up * clean up * fully remove chroma and add transformers env var * readd uvicorn for fastapi * readd tokenizer parallelism env var * remove chroma from docs * remove chroma from UI * try removing custom pysqlite3 build * hard code limit * optimize queries * revert explore query * fix query * keep building pysqlite3 * single pass fetch and process * remove unnecessary re-embed * update deps * move SqliteVecQueueDatabase to db directory * make search thumbnail take up full size of results box * improve typing * improve model downloading and add status screen * daemon downloading thread * catch case when semantic search is disabled * fix typing * build sqlite_vec from source * resolve conflict * file permissions * try build deps * remove sources * sources * fix thread start * include git in build * reorder embeddings after detectors are started * build with sqlite amalgamation * non-platform specific * use wget instead of curl * remove unzip -d * remove sqlite_vec from requirements and load the compiled version * fix build * avoid race in db connection * add scale_factor and bias to description zscore normalization
2024-10-07 22:30:45 +02:00
def __init__(
self, config: FrigateConfig, stop_event: MpEvent, db: SqliteVecQueueDatabase
Use sqlite-vec extension instead of chromadb for embeddings (#14163) * swap sqlite_vec for chroma in requirements * load sqlite_vec in embeddings manager * remove chroma and revamp Embeddings class for sqlite_vec * manual minilm onnx inference * remove chroma in clip model * migrate api from chroma to sqlite_vec * migrate event cleanup from chroma to sqlite_vec * migrate embedding maintainer from chroma to sqlite_vec * genai description for sqlite_vec * load sqlite_vec in main thread db * extend the SqliteQueueDatabase class and use peewee db.execute_sql * search with Event type for similarity * fix similarity search * install and add comment about transformers * fix normalization * add id filter * clean up * clean up * fully remove chroma and add transformers env var * readd uvicorn for fastapi * readd tokenizer parallelism env var * remove chroma from docs * remove chroma from UI * try removing custom pysqlite3 build * hard code limit * optimize queries * revert explore query * fix query * keep building pysqlite3 * single pass fetch and process * remove unnecessary re-embed * update deps * move SqliteVecQueueDatabase to db directory * make search thumbnail take up full size of results box * improve typing * improve model downloading and add status screen * daemon downloading thread * catch case when semantic search is disabled * fix typing * build sqlite_vec from source * resolve conflict * file permissions * try build deps * remove sources * sources * fix thread start * include git in build * reorder embeddings after detectors are started * build with sqlite amalgamation * non-platform specific * use wget instead of curl * remove unzip -d * remove sqlite_vec from requirements and load the compiled version * fix build * avoid race in db connection * add scale_factor and bias to description zscore normalization
2024-10-07 22:30:45 +02:00
):
super().__init__(name="event_cleanup")
self.config = config
self.stop_event = stop_event
Use sqlite-vec extension instead of chromadb for embeddings (#14163) * swap sqlite_vec for chroma in requirements * load sqlite_vec in embeddings manager * remove chroma and revamp Embeddings class for sqlite_vec * manual minilm onnx inference * remove chroma in clip model * migrate api from chroma to sqlite_vec * migrate event cleanup from chroma to sqlite_vec * migrate embedding maintainer from chroma to sqlite_vec * genai description for sqlite_vec * load sqlite_vec in main thread db * extend the SqliteQueueDatabase class and use peewee db.execute_sql * search with Event type for similarity * fix similarity search * install and add comment about transformers * fix normalization * add id filter * clean up * clean up * fully remove chroma and add transformers env var * readd uvicorn for fastapi * readd tokenizer parallelism env var * remove chroma from docs * remove chroma from UI * try removing custom pysqlite3 build * hard code limit * optimize queries * revert explore query * fix query * keep building pysqlite3 * single pass fetch and process * remove unnecessary re-embed * update deps * move SqliteVecQueueDatabase to db directory * make search thumbnail take up full size of results box * improve typing * improve model downloading and add status screen * daemon downloading thread * catch case when semantic search is disabled * fix typing * build sqlite_vec from source * resolve conflict * file permissions * try build deps * remove sources * sources * fix thread start * include git in build * reorder embeddings after detectors are started * build with sqlite amalgamation * non-platform specific * use wget instead of curl * remove unzip -d * remove sqlite_vec from requirements and load the compiled version * fix build * avoid race in db connection * add scale_factor and bias to description zscore normalization
2024-10-07 22:30:45 +02:00
self.db = db
self.camera_keys = list(self.config.cameras.keys())
self.removed_camera_labels: list[str] = None
self.camera_labels: dict[str, dict[str, any]] = {}
def get_removed_camera_labels(self) -> list[Event]:
"""Get a list of distinct labels for removed cameras."""
if self.removed_camera_labels is None:
self.removed_camera_labels = list(
Event.select(Event.label)
.where(Event.camera.not_in(self.camera_keys))
.distinct()
.execute()
)
return self.removed_camera_labels
def get_camera_labels(self, camera: str) -> list[Event]:
"""Get a list of distinct labels for each camera, updating once a day."""
if (
self.camera_labels.get(camera) is None
or self.camera_labels[camera]["last_update"]
< (datetime.datetime.now() - datetime.timedelta(days=1)).timestamp()
):
self.camera_labels[camera] = {
"last_update": datetime.datetime.now().timestamp(),
"labels": list(
Event.select(Event.label)
.where(Event.camera == camera)
.distinct()
.execute()
),
}
return self.camera_labels[camera]["labels"]
def expire_snapshots(self) -> list[str]:
## Expire events from unlisted cameras based on the global config
retain_config = self.config.snapshots.retain
file_extension = "jpg"
update_params = {"has_snapshot": False}
distinct_labels = self.get_removed_camera_labels()
## Expire events from cameras no longer in the config
# loop over object types in db
for event in distinct_labels:
# get expiration time for this label
expire_days = retain_config.objects.get(event.label, retain_config.default)
expire_after = (
datetime.datetime.now() - datetime.timedelta(days=expire_days)
).timestamp()
# grab all events after specific time
expired_events: list[Event] = (
Event.select(
Event.id,
Event.camera,
)
.where(
Event.camera.not_in(self.camera_keys),
Event.start_time < expire_after,
Event.label == event.label,
Event.retain_indefinitely == False,
)
.namedtuples()
.iterator()
)
logger.debug(f"{len(list(expired_events))} events can be expired")
# delete the media from disk
for expired in expired_events:
media_name = f"{expired.camera}-{expired.id}"
media_path = Path(
f"{os.path.join(CLIPS_DIR, media_name)}.{file_extension}"
)
try:
media_path.unlink(missing_ok=True)
if file_extension == "jpg":
media_path = Path(
f"{os.path.join(CLIPS_DIR, media_name)}-clean.png"
)
media_path.unlink(missing_ok=True)
except OSError as e:
logger.warning(f"Unable to delete event images: {e}")
# update the clips attribute for the db entry
query = Event.select(Event.id).where(
Event.camera.not_in(self.camera_keys),
Event.start_time < expire_after,
Event.label == event.label,
Event.retain_indefinitely == False,
)
events_to_update = []
for batch in query.iterator():
events_to_update.extend([event.id for event in batch])
if len(events_to_update) >= CHUNK_SIZE:
logger.debug(
f"Updating {update_params} for {len(events_to_update)} events"
)
Event.update(update_params).where(
Event.id << events_to_update
).execute()
events_to_update = []
# Update any remaining events
if events_to_update:
logger.debug(
f"Updating clips/snapshots attribute for {len(events_to_update)} events"
)
Event.update(update_params).where(
Event.id << events_to_update
).execute()
events_to_update = []
## Expire events from cameras based on the camera config
for name, camera in self.config.cameras.items():
retain_config = camera.snapshots.retain
# get distinct objects in database for this camera
distinct_labels = self.get_camera_labels(name)
# loop over object types in db
for event in distinct_labels:
# get expiration time for this label
expire_days = retain_config.objects.get(
event.label, retain_config.default
)
expire_after = (
datetime.datetime.now() - datetime.timedelta(days=expire_days)
).timestamp()
# grab all events after specific time
expired_events = (
Event.select(
Event.id,
Event.camera,
)
.where(
Event.camera == name,
Event.start_time < expire_after,
Event.label == event.label,
Event.retain_indefinitely == False,
)
.namedtuples()
.iterator()
)
# delete the grabbed clips from disk
# only snapshots are stored in /clips
# so no need to delete mp4 files
for event in expired_events:
events_to_update.append(event.id)
try:
media_name = f"{event.camera}-{event.id}"
media_path = Path(
f"{os.path.join(CLIPS_DIR, media_name)}.{file_extension}"
)
media_path.unlink(missing_ok=True)
media_path = Path(
f"{os.path.join(CLIPS_DIR, media_name)}-clean.png"
)
media_path.unlink(missing_ok=True)
except OSError as e:
logger.warning(f"Unable to delete event images: {e}")
# update the clips attribute for the db entry
for i in range(0, len(events_to_update), CHUNK_SIZE):
batch = events_to_update[i : i + CHUNK_SIZE]
logger.debug(f"Updating {update_params} for {len(batch)} events")
Event.update(update_params).where(Event.id << batch).execute()
return events_to_update
def expire_clips(self) -> list[str]:
## Expire events from unlisted cameras based on the global config
expire_days = max(
self.config.record.alerts.retain.days,
self.config.record.detections.retain.days,
)
file_extension = None # mp4 clips are no longer stored in /clips
update_params = {"has_clip": False}
# get expiration time for this label
expire_after = (
datetime.datetime.now() - datetime.timedelta(days=expire_days)
).timestamp()
# grab all events after specific time
expired_events: list[Event] = (
Event.select(
Event.id,
Event.camera,
)
.where(
Event.camera.not_in(self.camera_keys),
Event.start_time < expire_after,
Event.retain_indefinitely == False,
)
.namedtuples()
.iterator()
)
logger.debug(f"{len(list(expired_events))} events can be expired")
# delete the media from disk
for expired in expired_events:
media_name = f"{expired.camera}-{expired.id}"
media_path = Path(f"{os.path.join(CLIPS_DIR, media_name)}.{file_extension}")
try:
media_path.unlink(missing_ok=True)
if file_extension == "jpg":
media_path = Path(
f"{os.path.join(CLIPS_DIR, media_name)}-clean.png"
)
media_path.unlink(missing_ok=True)
except OSError as e:
logger.warning(f"Unable to delete event images: {e}")
# update the clips attribute for the db entry
query = Event.select(Event.id).where(
Event.camera.not_in(self.camera_keys),
Event.start_time < expire_after,
Event.retain_indefinitely == False,
)
events_to_update = []
for batch in query.iterator():
events_to_update.extend([event.id for event in batch])
if len(events_to_update) >= CHUNK_SIZE:
logger.debug(
f"Updating {update_params} for {len(events_to_update)} events"
)
Event.update(update_params).where(
Event.id << events_to_update
).execute()
events_to_update = []
# Update any remaining events
if events_to_update:
logger.debug(
f"Updating clips/snapshots attribute for {len(events_to_update)} events"
)
Event.update(update_params).where(Event.id << events_to_update).execute()
events_to_update = []
now = datetime.datetime.now()
## Expire events from cameras based on the camera config
for name, camera in self.config.cameras.items():
expire_days = max(
camera.record.alerts.retain.days,
camera.record.detections.retain.days,
)
alert_expire_date = (
now - datetime.timedelta(days=camera.record.alerts.retain.days)
).timestamp()
detection_expire_date = (
now - datetime.timedelta(days=camera.record.detections.retain.days)
).timestamp()
# grab all events after specific time
expired_events = (
Event.select(
Event.id,
Event.camera,
)
.where(
Event.camera == name,
Event.retain_indefinitely == False,
(
(
(Event.data["max_severity"] != "detection")
| (Event.data["max_severity"].is_null())
)
& (Event.end_time < alert_expire_date)
)
| (
(Event.data["max_severity"] == "detection")
& (Event.end_time < detection_expire_date)
),
)
.namedtuples()
.iterator()
)
# delete the grabbed clips from disk
# only snapshots are stored in /clips
# so no need to delete mp4 files
for event in expired_events:
events_to_update.append(event.id)
# update the clips attribute for the db entry
for i in range(0, len(events_to_update), CHUNK_SIZE):
batch = events_to_update[i : i + CHUNK_SIZE]
logger.debug(f"Updating {update_params} for {len(batch)} events")
Event.update(update_params).where(Event.id << batch).execute()
return events_to_update
def run(self) -> None:
# only expire events every 5 minutes
while not self.stop_event.wait(1):
events_with_expired_clips = self.expire_clips()
return
# delete timeline entries for events that have expired recordings
2024-08-09 23:42:51 +02:00
# delete up to 100,000 at a time
max_deletes = 100000
deleted_events_list = list(events_with_expired_clips)
for i in range(0, len(deleted_events_list), max_deletes):
Timeline.delete().where(
Timeline.source_id << deleted_events_list[i : i + max_deletes]
).execute()
self.expire_snapshots()
# drop events from db where has_clip and has_snapshot are false
events = (
Event.select()
.where(Event.has_clip == False, Event.has_snapshot == False)
.iterator()
)
events_to_delete = [e.id for e in events]
logger.debug(f"Found {len(events_to_delete)} events that can be expired")
if len(events_to_delete) > 0:
for i in range(0, len(events_to_delete), CHUNK_SIZE):
chunk = events_to_delete[i : i + CHUNK_SIZE]
logger.debug(f"Deleting {len(chunk)} events from the database")
Event.delete().where(Event.id << chunk).execute()
if self.config.semantic_search.enabled:
2024-10-11 01:53:11 +02:00
self.db.delete_embeddings_description(event_ids=chunk)
self.db.delete_embeddings_thumbnail(event_ids=chunk)
Use sqlite-vec extension instead of chromadb for embeddings (#14163) * swap sqlite_vec for chroma in requirements * load sqlite_vec in embeddings manager * remove chroma and revamp Embeddings class for sqlite_vec * manual minilm onnx inference * remove chroma in clip model * migrate api from chroma to sqlite_vec * migrate event cleanup from chroma to sqlite_vec * migrate embedding maintainer from chroma to sqlite_vec * genai description for sqlite_vec * load sqlite_vec in main thread db * extend the SqliteQueueDatabase class and use peewee db.execute_sql * search with Event type for similarity * fix similarity search * install and add comment about transformers * fix normalization * add id filter * clean up * clean up * fully remove chroma and add transformers env var * readd uvicorn for fastapi * readd tokenizer parallelism env var * remove chroma from docs * remove chroma from UI * try removing custom pysqlite3 build * hard code limit * optimize queries * revert explore query * fix query * keep building pysqlite3 * single pass fetch and process * remove unnecessary re-embed * update deps * move SqliteVecQueueDatabase to db directory * make search thumbnail take up full size of results box * improve typing * improve model downloading and add status screen * daemon downloading thread * catch case when semantic search is disabled * fix typing * build sqlite_vec from source * resolve conflict * file permissions * try build deps * remove sources * sources * fix thread start * include git in build * reorder embeddings after detectors are started * build with sqlite amalgamation * non-platform specific * use wget instead of curl * remove unzip -d * remove sqlite_vec from requirements and load the compiled version * fix build * avoid race in db connection * add scale_factor and bias to description zscore normalization
2024-10-07 22:30:45 +02:00
logger.debug(f"Deleted {len(events_to_delete)} embeddings")
logger.info("Exiting event cleanup...")