Embeddings normalization fixes (#14284)

* Use cosine distance metric for vec tables

* Only apply normalization to multi modal searches

* Catch possible edge case in stddev calc

* Use sigmoid function for normalization for multi modal searches only

* Ensure we get model state on initial page load

* Only save stats for multi modal searches and only use cosine similarity for image -> image search
This commit is contained in:
Josh Hawkins
2024-10-11 13:11:11 -05:00
committed by GitHub
parent d4b9b5a7dd
commit 8a8a0c7dec
5 changed files with 41 additions and 26 deletions

View File

@@ -20,10 +20,11 @@ class ZScoreNormalization:
@property
def stddev(self):
return math.sqrt(self.variance)
return math.sqrt(self.variance) if self.variance > 0 else 0.0
def normalize(self, distances: list[float]):
self._update(distances)
def normalize(self, distances: list[float], save_stats: bool):
if save_stats:
self._update(distances)
if self.stddev == 0:
return distances
return [