Skip to content

Commit 0904f06

Browse files
authored
PI-2526 Switch to cosine similarity + increase chunk size (#4552)
The Data Science team have been using cosing similarity for their testing, whereas OpenSearch defaults to L2 (Euclidean distance). Also increasing chunk size as a workaround for the batch size limit of 32 in the mxbai model, which limits the length of the notes we can process in one go to 1792 chars (now 3584).
1 parent ed3cf81 commit 0904f06

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

projects/person-search-index-from-delius/container/pipelines/contact/index/index-template-semantic.yml

+1
Original file line numberDiff line numberDiff line change
@@ -799,6 +799,7 @@ template:
799799
knn:
800800
type: knn_vector
801801
dimension: 1024
802+
space_type: cosinesimil
802803
notes:
803804
copy_to: text
804805
type: text

projects/person-search-index-from-delius/container/pipelines/contact/index/ingest-pipeline.tpl.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"text_chunking": {
2626
"algorithm": {
2727
"fixed_token_length": {
28-
"token_limit": 64,
28+
"token_limit": 128,
2929
"overlap_rate": 0.125,
3030
"tokenizer": "standard"
3131
}

0 commit comments

Comments
 (0)