Skip to content

Conversation

@joelamming
Copy link
Contributor

Hugely appreciative of the Dgraph team’s work. Native vector search integrated directly into a graph database is kind of a no brainer today. Deployed Dgraph (both vanilla and customised) in systems with 1M+ vectors guiding deep traversal queries across 10M+ nodes -- tight coupling of vector search with graph traversal at massive scale gets us closer to something that could represent the fuzzy nuances of everything in an enterprise. Certainly not the biggest deployment your team will have seen, but this PR fixes an under‑recall edge case in HNSW and introduces opt‑in, per‑query controls that let users dial recall vs latency safely and predictably. I’ve had this running in production for a while and thought it worth proposing to main.

  • Summary

    • Fix incorrect early termination in the HNSW bottom layer that could stop before collecting k neighbours.
    • Extend similar_to with optional per‑query ef and distance_threshold (string or JSON‑like fourth argument).
    • Backwards compatible: default 3‑arg behaviour of similar_to is unchanged.
  • Motivation

    • In narrow probes, the bottom‑layer search could exit at a local minimum before collecting k, hurting recall.
    • No per‑query ef meant recall vs latency trade‑offs required global tuning or inflating k (and downstream work).
    • This PR corrects the termination logic and adds opt‑in knobs so users can increase exploration only when needed.
  • Changes (key files)

    • tok/hnsw/persistent_hnsw.go: fix early termination, add SearchWithOptions/SearchWithUidAndOptions, apply ef override at upper layers and max(k, ef) at bottom layer, apply distance_threshold in the metric domain (Euclidean squared internally, cosine as 1 − sim).
    • tok/index/index.go: add VectorIndexOptions and OptionalSearchOptions (non‑breaking).
    • worker/task.go: parse optional fourth argument to similar_to (ef, distance_threshold), thread options, route to optional methods when provided, guard zero/negative k.
    • tok/index/search_path.go: add SearchPathResult helper.
    • Tests: tok/hnsw/ef_recall_test.go adds
      • TestHNSWSearchEfOverrideImprovesRecall
      • TestHNSWDistanceThreshold_Euclidean
      • TestHNSWDistanceThreshold_Cosine
    • CHANGELOG.md: Unreleased entry for HNSW fix and per‑query options.
  • Backwards compatibility

    • No default behaviour changes. The three‑argument similar_to(attr, k, vector_or_uid) is unchanged.
    • ef and distance_threshold are optional, unsupported metrics safely ignore the threshold.
  • Performance

    • No overhead without options.
    • With ef, bottom‑layer candidate size becomes max(k, ef) (as in HNSW), cost scales accordingly.
    • Threshold filtering is a cheap pass over candidates, squaring Euclidean thresholds avoids extra roots.
  • Rationale and alignment

    • Matches HNSW semantics: ef_search controls exploration/recall, k controls output size.
    • Aligns with Typesense’s per‑query ef and distance_threshold semantics for familiarity.

Checklist

  • Code compiles correctly and linting passes locally
  • For all code changes, an entry added to the CHANGELOG.md describing this PR
  • Tests added for new functionality / regression tests for the bug fix
  • For public APIs/new features, docs PR will be prepared and linked here after initial review

@joelamming joelamming requested a review from a team October 19, 2025 20:30
@darkcoderrises
Copy link

Hey, Could you send a copy of the PR to https://github.com/predictable-labs/dgraph/pulls also? I am the author of the bugs that you have fixed. I have made a fork of the repo to be able to make changes myself. I have fixed some bugs in the vector index that makes it much faster.

@joelamming
Copy link
Contributor Author

Hey, Could you send a copy of the PR to https://github.com/predictable-labs/dgraph/pulls also? I am the author of the bugs that you have fixed. I have made a fork of the repo to be able to make changes myself. I have fixed some bugs in the vector index that makes it much faster.

Cheers for the quick look. I’ve opened a mirror PR against your fork here: predictable-labs#17

Your branch is ahead of upstream so it shows conflicts in tok/hnsw/persistent_hnsw.go and worker/task.go. You're welcome to cherrypick from my branch into your tree. I could otherwise rebase onto predictable-labs/dgraph:main and resolve the conflicts on my side (will be tomorrow).

Tests for both parts are in tok/hnsw/ef_recall_test.go, and CHANGELOG.md has an Unreleased entry. Let me know which route you prefer and I'll see about implementing tomorrow.

@darkcoderrises
Copy link

@joelamming Thanks a lot for sending your changes here also. I can take a look and convert your changes. Basically we have introduced a new Partioned HNSW which is supposed to be much faster. We also figured out a race condition in the hnsw tree which led to less recall. We were able to improve accuracy significantly even without your changes in the new hnsw.

@matthewmcneely
Copy link
Contributor

@joelamming Thanks for this contribution, truly amazing. Could I ask that you create an integration test in /query/vector/vector_test.go that illustrates the new arguments/functionality? This will help with documentation and general understanding of this new functionality.

Also, a few nits:

  • Can you not use the term "neighbour", instead "neighbor"? I know we do it wrong on this side of the pond, but other places in the codebase use the North American version
  • Can you install/run trunk fmt against your changes? That's the only (required) failing check

Thanks!

@joelamming
Copy link
Contributor Author

Thanks for the nudge -- I’ve pushed a few follow-up commits:

  • Added TestSimilarToOptionsIntegration in query/vector/vector_test.go. It tries both the "ef=…" string syntax and the JSON-style {distance_threshold: …} options and passes against a fresh compose cluster (go test ./query/vector -run TestSimilarToOptionsIntegration -tags=integration -count=1)
  • Normalised the spelling to “neighbor” in the new vector/HNSW paths (tok/hnsw/persistent_hnsw.go, tok/index/search_path.go)
  • Ran trunk fmt over the tree, checks are clean now

While tracking down the integration failure I also found that Euclidean distance_threshold was being compared to the squared distance. Never surfaced in production as we've rarely used it but it can be an important part of full featured vector search. I fixed the filtering to use the raw metric distance and refreshed the regression tests (TestHNSWDistanceThreshold_Euclidean). Everything including the integration test now lines up with the expected semantics

@joelamming
Copy link
Contributor Author

joelamming commented Oct 26, 2025

@matthewmcneely I see the CI failures from the GitHub Actions runs. I've identified and fixed the parser regression introduced by the similar_to JSON argument support.

The lexer changes to support similar_to(pred, k, vec, {ef: 64, distance_threshold: 0.45}) altered error handling for a specific class of malformed queries, causing three parser tests to fail. Not one that surfaced for us in production but makes sense to keep in dgraph main.

Yesterday:

  • Fixed the lexer/parser interaction to properly validate braces
  • Updated affected tests with documentation explaining the (minor) error message trade-off
  • Added comprehensive test coverage for edge cases
  • All local tests passing: go test ./dql

Today I'm running the full CI-equivalent suite on an ubuntu-noble-24.04-amd64 EC2 runner (same config as the Actions hosts). For each harness run I'm doing the standard ./t -r setup/teardown and recopying dgraph into $(go env GOPATH)/bin. As we speak:

  • go clean -testcache
  • ./t -r
  • cp dgraph/dgraph "$(go env GOPATH)/bin/dgraph"
  • ./t --suite=core
  • ./t --suite=vector
  • ./t -r
  • go clean -testcache
  • go test -v -timeout=90m -failfast -tags=integration2 ./...

Please hold off on reviewing until I've confirmed everything is clean on the EC2 instance and I've finished pushing the fixes. Should have results later today.

Will provide detailed docs on the architectural tradeoff in detail in the next push -- happy to discuss the approach once you have a chance to review.

Thanks for your patience!

@joelamming
Copy link
Contributor Author

@matthewmcneely CI tests all pass on my end after bumping to the 32GB instance

  • Guarded similar_to's JSON literal parsing in the parser so only the 4th argument can hold
    {...} while the lexer stays strict elsewhere
  • Restored legacy “Unrecognized character…” errors for non-similar_to stray braces
  • Updated parser tests to reflect the restored messaging and added coverage for the
    supported/unsupported call shapes

Ran tests:

  • go test ./dql
  • ./t -r && cp dgraph/dgraph "$(go env GOPATH)/bin/dgraph"
  • ./t --suite=core
  • ./t --suite=vector
  • ./t --suite=load
  • ./t --suite=systest
  • ./t --suite=ldbc
  • go test -v -timeout=90m -failfast -tags=integration2 ./...
  • go test -v -timeout=120m -failfast -tags=upgrade
    github.com/hypermodeinc/dgraph/v25/{acl,worker,query}
  • go clean -testcache between suites and recopying dgraph each time per EC2 workflow

Given the runner matches Actions (Ubuntu 24.04, 16 vCPU/32 GiB) I don’t anticipate surprises on CI

Many thanks!

@joelamming
Copy link
Contributor Author

Hi @matthewmcneely

Just to follow up on this -- the new commits are pushed and it looks like the CI workflows are just waiting for approval to run

Thanks for your time!

@matthewmcneely
Copy link
Contributor

CI workflows are just waiting for approval

We have some issues with our benchmarks repo following our switch of accounts. Waiting for that to be sorted out.

@matthewmcneely
Copy link
Contributor

@joelamming Could you merge our main branch? If has fixes for paths in two CI actions that need updates. Thanks.

@joelamming joelamming requested a review from a team as a code owner December 17, 2025 00:40
@matthewmcneely
Copy link
Contributor

Hey @joelamming, Finally able to get back on this. I've been wrestling with the syntax for those optional parameters to the similar_to function. I'm sure you gave it quite a lot of thought too. Unfortunately, DQL has evolved with differing concepts for positional and optional arguments.

Would you be opposed to keeping the syntax more in-line with how our directives take optional arguments? Take @recurse(depth: 3, loop: true) for example, both arguments are optional. For similar_to, it might look like this:

similar_to(vpred, 3, $vec) # the current function syntax
similar_to(vpred, 3, $vec, ef: 2)
similar_to(vpred, 4, $vec, distance_threshold: 1.5)
similar_to(vpred, 4, $vec, ef: 64, distance_threshold: 0.5)
similar_to(vpred, 3, $vec, ef: $effort) #effort is a query variable

Pros:

  • This minimizes new concepts in the DQL syntax
  • This would allow the support of DQL query/block variables for those params (note the last example)

Cons:

  • You need to rewrite the parser

Let me know your thoughts. If you're too busy let me know and I'll have a stab at it.

@joelamming
Copy link
Contributor Author

@matthewmcneely -- working on this now. Did indeed wrestle with the syntax and chose to keep in line with the original similar_to. But didn't seem quite in line with the rest of the codebase. Currently testing your suggestion. Will get back with some results/commits shortly

@matthewmcneely
Copy link
Contributor

@joelamming Great, thanks for taking it on. BTW, are you OK with my adding your synthetic test to my BEIR eval benchmark in the dgraph-benchmarks repo?

@joelamming
Copy link
Contributor Author

joelamming commented Dec 24, 2025

BTW, are you OK with my adding your synthetic test to my BEIR eval benchmark in the dgraph-benchmarks repo?

Of course -- please feel free to include.

I've pushed a commit that implements your suggestions. It passes my local tests and the synthetic recall test. Should be good for CI. Thanks!

@matthewmcneely matthewmcneely merged commit 2c84a01 into dgraph-io:main Dec 29, 2025
24 of 25 checks passed
matthewmcneely added a commit that referenced this pull request Jan 15, 2026
…uct metrics (#9559)

### Summary

This PR fixes a critical bug in HNSW vector search where **cosine
similarity and dot product metrics returned incorrect results**. The
search algorithm was treating all metrics as distance metrics (lower is
better), causing similarity metrics (higher is better) to return the
*worst* matches instead of the best.

### Problem

The HNSW implementation had two issues with similarity-based metrics:

1. **Search phase**: The candidate heap in
persistent_hnsw.go::searchPersistentLayer always used a min-heap, which
pops the lowest value first. For similarity metrics where higher values
are better, this caused the algorithm to explore the worst candidates
first and terminate prematurely.

2. **Edge pruning phase**: The helper.go::addNeighbors function used a
fixed comparison (`>`) when pruning edges, which is correct for distance
metrics but inverted for similarity metrics. This resulted in keeping
the worst edges instead of the best.

### Root Cause

The original code assumed all metrics behave like distance metrics:

```go
// Always used min-heap (pops lowest first)
candidateHeap := *buildPersistentHeapByInit(elements)

// Edge pruning always used > comparison
compare: func(i, j uint64) bool {
    return ph.distance_betw(..., i, ...) > ph.distance_betw(..., j, ...)
}
```

For **Euclidean distance**, lower values = better matches → min-heap is
correct.
For **Cosine/DotProduct similarity**, higher values = better matches →
need max-heap.

### Solution

#### 1. Added candidateHeap interface with metric-aware heap selection

```go
type candidateHeap[T c.Float] interface {
    Len() int
    Pop() minPersistentHeapElement[T]
    Push(minPersistentHeapElement[T])
    PopLast() minPersistentHeapElement[T]
}

func buildCandidateHeap[T c.Float](array []minPersistentHeapElement[T], isSimilarityMetric bool) candidateHeap[T] {
    if isSimilarityMetric {
        return &maxHeapWrapper[T]{...}  // Pops highest first
    }
    return &minHeapWrapper[T]{...}      // Pops lowest first
}
```

#### 2. Added isSimilarityMetric flag to SimilarityType

```go
type SimilarityType[T c.Float] struct {
    // ... existing fields
    isSimilarityMetric bool  // true for cosine, dotproduct; false for euclidean
}
```

#### 3. Fixed edge pruning comparison in addNeighbors

```go
compare: func(i, j uint64) bool {
    distI := ph.distance_betw(ctx, tc, uuid, i, &inVec, &outVec)
    distJ := ph.distance_betw(ctx, tc, uuid, j, &inVec, &outVec)
    return !ph.simType.isBetterScore(distI, distJ)
}
```

### Files Changed

| File | Changes |
|------|---------|
| tok/hnsw/heap.go | Added candidateHeap interface, minHeapWrapper,
maxHeapWrapper, and buildCandidateHeap factory |
| tok/hnsw/helper.go | Added isSimilarityMetric field to SimilarityType;
fixed edge pruning comparison |
| tok/hnsw/persistent_hnsw.go | Updated searchPersistentLayer to use
metric-aware candidate heap |
| tok/hnsw/persistent_hnsw_test.go | Added comprehensive unit tests for
heap behavior and search correctness |

### Testing

Added new tests covering:

- TestCandidateHeapMinHeap: Verifies min-heap pops in ascending order
- TestCandidateHeapMaxHeap: Verifies max-heap pops in descending order
- TestCandidateHeapPushPop: Tests Push/Pop operations for both heap
types
- TestCandidateHeapPopLast: Tests PopLast for both types
- TestSimilarityTypeIsSimilarityMetric: Verifies flag is set correctly
for each metric
- TestSearchReturnsCorrectOrderForAllMetrics: End-to-end test for
Euclidean, Cosine, and DotProduct
- TestEdgePruningKeepsBestEdges: Verifies edge pruning keeps best edges
for each metric

### Performance Note

This fix builds on PR #9514 which corrected the early termination
condition. Together, these changes ensure HNSW search explores the
correct number of candidates and returns properly ordered results.

Users experiencing slower insert/search times compared to v25.1.0 can
tune performance by lowering efConstruction and efSearch parameters when
creating your vector indexes.

Lower values trade recall for speed. The default values
(efConstruction=128, efSearch=64) prioritize recall.

### GenAI Notice

Parts of this implementation and all of the testing was generated using
Claude Opus 4.5 (thinking).

### Checklist

- [x] The PR title follows the
[Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/#summary) syntax,
leading
      with `fix:`, `feat:`, `chore:`, `ci:`, etc.
- [x] Code compiles correctly and linting (via trunk) passes locally
- [x] Tests added for new functionality, or regression tests for bug
fixes added as applicable

Fixes #9558

### Benchmarks

Our BEIR SciFact Information Retrieval Benchmarks now show recall rates
close to or exceeding acceptable and excellent performance for all
metrics.

```
============================================================================================================================================
NDCG@k Comparison
============================================================================================================================================

NDCG@1:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████░░░░░░░░░░░░ 0.6200
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████░░░░░░░░░░░░░░░ 0.5200
  Dgraph v25.1.0 (euclidean)                           ███████████████░░░░░░░░░░░░░░░ 0.5000
  Dgraph v25.1.0 (cosine)                              ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2767
  Dgraph v25.1.0 (dotproduct)                          ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2867
  Dgraph staged-fix (euclidean)                        ███████████████░░░░░░░░░░░░░░░ 0.5233
  Dgraph staged-fix (cosine)                           ███████████████░░░░░░░░░░░░░░░ 0.5300
  Dgraph staged-fix (dotproduct)                       ███████████████░░░░░░░░░░░░░░░ 0.5167

NDCG@3:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████░░░░░░░░░░ 0.6700
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████░░░░░░░░░░░░ 0.6000
  Dgraph v25.1.0 (euclidean)                           ████████████████░░░░░░░░░░░░░░ 0.5588
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3043
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3164
  Dgraph staged-fix (euclidean)                        █████████████████░░░░░░░░░░░░░ 0.5918
  Dgraph staged-fix (cosine)                           █████████████████░░░░░░░░░░░░░ 0.5957
  Dgraph staged-fix (dotproduct)                       █████████████████░░░░░░░░░░░░░ 0.5830

NDCG@5:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████░░░░░░░░░░ 0.6900
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████░░░░░░░░░░░░ 0.6300
  Dgraph v25.1.0 (euclidean)                           █████████████████░░░░░░░░░░░░░ 0.5858
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3197
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3290
  Dgraph staged-fix (euclidean)                        ██████████████████░░░░░░░░░░░░ 0.6168
  Dgraph staged-fix (cosine)                           ██████████████████░░░░░░░░░░░░ 0.6240
  Dgraph staged-fix (dotproduct)                       ██████████████████░░░░░░░░░░░░ 0.6081

NDCG@10:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) █████████████████████░░░░░░░░░ 0.7000
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████████░░░░░░░░░░░ 0.6500
  Dgraph v25.1.0 (euclidean)                           ██████████████████░░░░░░░░░░░░ 0.6118
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3305
  Dgraph v25.1.0 (dotproduct)                          ██████████░░░░░░░░░░░░░░░░░░░░ 0.3423
  Dgraph staged-fix (euclidean)                        ███████████████████░░░░░░░░░░░ 0.6461
  Dgraph staged-fix (cosine)                           ███████████████████░░░░░░░░░░░ 0.6505
  Dgraph staged-fix (dotproduct)                       ███████████████████░░░░░░░░░░░ 0.6369

NDCG@100:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) █████████████████████░░░░░░░░░ 0.7200
  BEIR Acceptable (Acceptable baseline (384-dim))      ████████████████████░░░░░░░░░░ 0.6800
  Dgraph v25.1.0 (euclidean)                           ███████████████████░░░░░░░░░░░ 0.6418
  Dgraph v25.1.0 (cosine)                              ██████████░░░░░░░░░░░░░░░░░░░░ 0.3445
  Dgraph v25.1.0 (dotproduct)                          ██████████░░░░░░░░░░░░░░░░░░░░ 0.3555
  Dgraph staged-fix (euclidean)                        ████████████████████░░░░░░░░░░ 0.6794
  Dgraph staged-fix (cosine)                           ████████████████████░░░░░░░░░░ 0.6849
  Dgraph staged-fix (dotproduct)                       ████████████████████░░░░░░░░░░ 0.6707

============================================================================================================================================
MAP@k Comparison
============================================================================================================================================

MAP@1:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████░░░░░░░░░░░░ 0.6000
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████░░░░░░░░░░░░░░░ 0.5000
  Dgraph v25.1.0 (euclidean)                           ██████████████░░░░░░░░░░░░░░░░ 0.4812
  Dgraph v25.1.0 (cosine)                              ███████░░░░░░░░░░░░░░░░░░░░░░░ 0.2586
  Dgraph v25.1.0 (dotproduct)                          ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2747
  Dgraph staged-fix (euclidean)                        ███████████████░░░░░░░░░░░░░░░ 0.5046
  Dgraph staged-fix (cosine)                           ███████████████░░░░░░░░░░░░░░░ 0.5112
  Dgraph staged-fix (dotproduct)                       ██████████████░░░░░░░░░░░░░░░░ 0.4979

MAP@3:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ███████████████████░░░░░░░░░░░ 0.6400
  BEIR Acceptable (Acceptable baseline (384-dim))      █████████████████░░░░░░░░░░░░░ 0.5700
  Dgraph v25.1.0 (euclidean)                           ████████████████░░░░░░░░░░░░░░ 0.5357
  Dgraph v25.1.0 (cosine)                              ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2883
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3022
  Dgraph staged-fix (euclidean)                        ████████████████░░░░░░░░░░░░░░ 0.5663
  Dgraph staged-fix (cosine)                           █████████████████░░░░░░░░░░░░░ 0.5707
  Dgraph staged-fix (dotproduct)                       ████████████████░░░░░░░░░░░░░░ 0.5579

MAP@5:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ███████████████████░░░░░░░░░░░ 0.6600
  BEIR Acceptable (Acceptable baseline (384-dim))      █████████████████░░░░░░░░░░░░░ 0.5900
  Dgraph v25.1.0 (euclidean)                           ████████████████░░░░░░░░░░░░░░ 0.5544
  Dgraph v25.1.0 (cosine)                              ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2993
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3113
  Dgraph staged-fix (euclidean)                        █████████████████░░░░░░░░░░░░░ 0.5838
  Dgraph staged-fix (cosine)                           █████████████████░░░░░░░░░░░░░ 0.5902
  Dgraph staged-fix (dotproduct)                       █████████████████░░░░░░░░░░░░░ 0.5755

MAP@10:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████░░░░░░░░░░ 0.6700
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████░░░░░░░░░░░░ 0.6000
  Dgraph v25.1.0 (euclidean)                           █████████████████░░░░░░░░░░░░░ 0.5676
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3045
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3175
  Dgraph staged-fix (euclidean)                        █████████████████░░░░░░░░░░░░░ 0.5987
  Dgraph staged-fix (cosine)                           ██████████████████░░░░░░░░░░░░ 0.6035
  Dgraph staged-fix (dotproduct)                       █████████████████░░░░░░░░░░░░░ 0.5900

MAP@100:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████░░░░░░░░░░ 0.6800
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████░░░░░░░░░░░░ 0.6100
  Dgraph v25.1.0 (euclidean)                           █████████████████░░░░░░░░░░░░░ 0.5746
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3074
  Dgraph v25.1.0 (dotproduct)                          █████████░░░░░░░░░░░░░░░░░░░░░ 0.3203
  Dgraph staged-fix (euclidean)                        ██████████████████░░░░░░░░░░░░ 0.6060
  Dgraph staged-fix (cosine)                           ██████████████████░░░░░░░░░░░░ 0.6113
  Dgraph staged-fix (dotproduct)                       █████████████████░░░░░░░░░░░░░ 0.5977

============================================================================================================================================
RECALL@k Comparison
============================================================================================================================================

Recall@1:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████░░░░░░░░░░░░ 0.6000
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████░░░░░░░░░░░░░░░ 0.5000
  Dgraph v25.1.0 (euclidean)                           ██████████████░░░░░░░░░░░░░░░░ 0.4812
  Dgraph v25.1.0 (cosine)                              ███████░░░░░░░░░░░░░░░░░░░░░░░ 0.2586
  Dgraph v25.1.0 (dotproduct)                          ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2747
  Dgraph staged-fix (euclidean)                        ███████████████░░░░░░░░░░░░░░░ 0.5046
  Dgraph staged-fix (cosine)                           ███████████████░░░░░░░░░░░░░░░ 0.5112
  Dgraph staged-fix (dotproduct)                       ██████████████░░░░░░░░░░░░░░░░ 0.4979

Recall@3:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) █████████████████████░░░░░░░░░ 0.7300
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████████░░░░░░░░░░░ 0.6500
  Dgraph v25.1.0 (euclidean)                           █████████████████░░░░░░░░░░░░░ 0.5984
  Dgraph v25.1.0 (cosine)                              █████████░░░░░░░░░░░░░░░░░░░░░ 0.3248
  Dgraph v25.1.0 (dotproduct)                          ██████████░░░░░░░░░░░░░░░░░░░░ 0.3377
  Dgraph staged-fix (euclidean)                        ███████████████████░░░░░░░░░░░ 0.6384
  Dgraph staged-fix (cosine)                           ███████████████████░░░░░░░░░░░ 0.6401
  Dgraph staged-fix (dotproduct)                       ██████████████████░░░░░░░░░░░░ 0.6284

Recall@5:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ███████████████████████░░░░░░░ 0.7900
  BEIR Acceptable (Acceptable baseline (384-dim))      █████████████████████░░░░░░░░░ 0.7200
  Dgraph v25.1.0 (euclidean)                           ███████████████████░░░░░░░░░░░ 0.6638
  Dgraph v25.1.0 (cosine)                              ██████████░░░░░░░░░░░░░░░░░░░░ 0.3632
  Dgraph v25.1.0 (dotproduct)                          ███████████░░░░░░░░░░░░░░░░░░░ 0.3697
  Dgraph staged-fix (euclidean)                        ████████████████████░░░░░░░░░░ 0.6988
  Dgraph staged-fix (cosine)                           █████████████████████░░░░░░░░░ 0.7088
  Dgraph staged-fix (dotproduct)                       ████████████████████░░░░░░░░░░ 0.6888

Recall@10:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) █████████████████████████░░░░░ 0.8400
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████████████░░░░░░░ 0.7800
  Dgraph v25.1.0 (euclidean)                           ██████████████████████░░░░░░░░ 0.7368
  Dgraph v25.1.0 (cosine)                              ███████████░░░░░░░░░░░░░░░░░░░ 0.3950
  Dgraph v25.1.0 (dotproduct)                          ████████████░░░░░░░░░░░░░░░░░░ 0.4074
  Dgraph staged-fix (euclidean)                        ███████████████████████░░░░░░░ 0.7808
  Dgraph staged-fix (cosine)                           ███████████████████████░░░░░░░ 0.7834
  Dgraph staged-fix (dotproduct)                       ███████████████████████░░░░░░░ 0.7701

Recall@100:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████████████░░ 0.9500
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████████████████░░░ 0.9000
  Dgraph v25.1.0 (euclidean)                           ██████████████████████████░░░░ 0.8717
  Dgraph v25.1.0 (cosine)                              █████████████░░░░░░░░░░░░░░░░░ 0.4589
  Dgraph v25.1.0 (dotproduct)                          █████████████░░░░░░░░░░░░░░░░░ 0.4658
  Dgraph staged-fix (euclidean)                        ████████████████████████████░░ 0.9350
  Dgraph staged-fix (cosine)                           ████████████████████████████░░ 0.9417
  Dgraph staged-fix (dotproduct)                       ███████████████████████████░░░ 0.9250

============================================================================================================================================
PRECISION@k Comparison
============================================================================================================================================

Precision@1:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████░░░░░░░░░░░░ 0.6200
  BEIR Acceptable (Acceptable baseline (384-dim))      ███████████████░░░░░░░░░░░░░░░ 0.5200
  Dgraph v25.1.0 (euclidean)                           ███████████████░░░░░░░░░░░░░░░ 0.5000
  Dgraph v25.1.0 (cosine)                              ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2767
  Dgraph v25.1.0 (dotproduct)                          ████████░░░░░░░░░░░░░░░░░░░░░░ 0.2867
  Dgraph staged-fix (euclidean)                        ███████████████░░░░░░░░░░░░░░░ 0.5233
  Dgraph staged-fix (cosine)                           ███████████████░░░░░░░░░░░░░░░ 0.5300
  Dgraph staged-fix (dotproduct)                       ███████████████░░░░░░░░░░░░░░░ 0.5167

Precision@3:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████████████████ 0.2700
  BEIR Acceptable (Acceptable baseline (384-dim))      █████████████████████████░░░░░ 0.2300
  Dgraph v25.1.0 (euclidean)                           ████████████████████████░░░░░░ 0.2178
  Dgraph v25.1.0 (cosine)                              █████████████░░░░░░░░░░░░░░░░░ 0.1211
  Dgraph v25.1.0 (dotproduct)                          █████████████░░░░░░░░░░░░░░░░░ 0.1211
  Dgraph staged-fix (euclidean)                        █████████████████████████░░░░░ 0.2311
  Dgraph staged-fix (cosine)                           █████████████████████████░░░░░ 0.2322
  Dgraph staged-fix (dotproduct)                       █████████████████████████░░░░░ 0.2278

Precision@5:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████████████████ 0.1800
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████████████░░░░ 0.1600
  Dgraph v25.1.0 (euclidean)                           ████████████████████████░░░░░░ 0.1480
  Dgraph v25.1.0 (cosine)                              █████████████░░░░░░░░░░░░░░░░░ 0.0827
  Dgraph v25.1.0 (dotproduct)                          █████████████░░░░░░░░░░░░░░░░░ 0.0807
  Dgraph staged-fix (euclidean)                        █████████████████████████░░░░░ 0.1553
  Dgraph staged-fix (cosine)                           ██████████████████████████░░░░ 0.1573
  Dgraph staged-fix (dotproduct)                       █████████████████████████░░░░░ 0.1533

Precision@10:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ██████████████████████████████ 0.1000
  BEIR Acceptable (Acceptable baseline (384-dim))      ██████████████████████████░░░░ 0.0900
  Dgraph v25.1.0 (euclidean)                           █████████████████████████░░░░░ 0.0837
  Dgraph v25.1.0 (cosine)                              █████████████░░░░░░░░░░░░░░░░░ 0.0453
  Dgraph v25.1.0 (dotproduct)                          █████████████░░░░░░░░░░░░░░░░░ 0.0447
  Dgraph staged-fix (euclidean)                        ██████████████████████████░░░░ 0.0887
  Dgraph staged-fix (cosine)                           ██████████████████████████░░░░ 0.0887
  Dgraph staged-fix (dotproduct)                       ██████████████████████████░░░░ 0.0873

Precision@100:
--------------------------------------------------------------------------------------------------------------------------------------------
  BEIR Excellent (State-of-the-art baseline (768-dim)) ████████████████████████████░░ 0.0100
  BEIR Acceptable (Acceptable baseline (384-dim))      ████████████████████████████░░ 0.0100
  Dgraph v25.1.0 (euclidean)                           ███████████████████████████░░░ 0.0100
  Dgraph v25.1.0 (cosine)                              ██████████████░░░░░░░░░░░░░░░░ 0.0053
  Dgraph v25.1.0 (dotproduct)                          ██████████████░░░░░░░░░░░░░░░░ 0.0051
  Dgraph staged-fix (euclidean)                        █████████████████████████████░ 0.0106
  Dgraph staged-fix (cosine)                           ██████████████████████████████ 0.0107
  Dgraph staged-fix (dotproduct)                       █████████████████████████████░ 0.0105
```

---------

Co-authored-by: Joe Lamming <191030909+joelamming@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants