Commit Graph

5 Commits

Author SHA1 Message Date
Josh Hawkins
8a8a0c7dec
Embeddings normalization fixes (#14284)
* Use cosine distance metric for vec tables

* Only apply normalization to multi modal searches

* Catch possible edge case in stddev calc

* Use sigmoid function for normalization for multi modal searches only

* Ensure we get model state on initial page load

* Only save stats for multi modal searches and only use cosine similarity for image -> image search
2024-10-11 13:11:11 -05:00
Nicolas Mowen
dd6276e706
Embeddings fixes (#14269)
* Add debugging logs for more info

* Improve timeout handling

* Fix event cleanup

* Handle zmq error and empty data

* Don't run download

* Remove unneeded embeddings creations

* Update timouts

* Init models immediately

* Fix order of init

* Cleanup
2024-10-10 16:37:43 -05:00
Nicolas Mowen
8ade85edec
Restructure embeddings (#14266)
* Restructure embeddings

* Use ZMQ to proxy embeddings requests

* Handle serialization

* Formatting

* Remove unused
2024-10-10 09:42:24 -06:00
Josh Hawkins
6ebad84160
initialize path before calling super() (#14203) 2024-10-07 16:17:57 -05:00
Josh Hawkins
24ac9f3e5a
Use sqlite-vec extension instead of chromadb for embeddings (#14163)
* swap sqlite_vec for chroma in requirements

* load sqlite_vec in embeddings manager

* remove chroma and revamp Embeddings class for sqlite_vec

* manual minilm onnx inference

* remove chroma in clip model

* migrate api from chroma to sqlite_vec

* migrate event cleanup from chroma to sqlite_vec

* migrate embedding maintainer from chroma to sqlite_vec

* genai description for sqlite_vec

* load sqlite_vec in main thread db

* extend the SqliteQueueDatabase class and use peewee db.execute_sql

* search with Event type for similarity

* fix similarity search

* install and add comment about transformers

* fix normalization

* add id filter

* clean up

* clean up

* fully remove chroma and add transformers env var

* readd uvicorn for fastapi

* readd tokenizer parallelism env var

* remove chroma from docs

* remove chroma from UI

* try removing custom pysqlite3 build

* hard code limit

* optimize queries

* revert explore query

* fix query

* keep building pysqlite3

* single pass fetch and process

* remove unnecessary re-embed

* update deps

* move SqliteVecQueueDatabase to db directory

* make search thumbnail take up full size of results box

* improve typing

* improve model downloading and add status screen

* daemon downloading thread

* catch case when semantic search is disabled

* fix typing

* build sqlite_vec from source

* resolve conflict

* file permissions

* try build deps

* remove sources

* sources

* fix thread start

* include git in build

* reorder embeddings after detectors are started

* build with sqlite amalgamation

* non-platform specific

* use wget instead of curl

* remove unzip -d

* remove sqlite_vec from requirements and load the compiled version

* fix build

* avoid race in db connection

* add scale_factor and bias to description zscore normalization
2024-10-07 14:30:45 -06:00