Skip to content

Conversation

@joelamming
Copy link

Hugely appreciative of the Dgraph team’s work. Native vector search integrated directly into a graph database is kind of a no brainer today. Deployed Dgraph (both vanilla and customised) in systems with 1M+ vectors guiding deep traversal queries across 10M+ nodes -- tight coupling of vector search with graph traversal at massive scale gets us closer to something that could represent the fuzzy nuances of everything in an enterprise. Certainly not the biggest deployment your team will have seen, but this PR fixes an under‑recall edge case in HNSW and introduces opt‑in, per‑query controls that let users dial recall vs latency safely and predictably. I’ve had this running in production for a while and thought it worth proposing to main.

  • Summary

    • Fix incorrect early termination in the HNSW bottom layer that could stop before collecting k neighbours.
    • Extend similar_to with optional per‑query ef and distance_threshold (string or JSON‑like fourth argument).
    • Backwards compatible: default 3‑arg behaviour of similar_to is unchanged.
  • Motivation

    • In narrow probes, the bottom‑layer search could exit at a local minimum before collecting k, hurting recall.
    • No per‑query ef meant recall vs latency trade‑offs required global tuning or inflating k (and downstream work).
    • This PR corrects the termination logic and adds opt‑in knobs so users can increase exploration only when needed.
  • Changes (key files)

    • tok/hnsw/persistent_hnsw.go: fix early termination, add SearchWithOptions/SearchWithUidAndOptions, apply ef override at upper layers and max(k, ef) at bottom layer, apply distance_threshold in the metric domain (Euclidean squared internally, cosine as 1 − sim).
    • tok/index/index.go: add VectorIndexOptions and OptionalSearchOptions (non‑breaking).
    • worker/task.go: parse optional fourth argument to similar_to (ef, distance_threshold), thread options, route to optional methods when provided, guard zero/negative k.
    • tok/index/search_path.go: add SearchPathResult helper.
    • Tests: tok/hnsw/ef_recall_test.go adds
      • TestHNSWSearchEfOverrideImprovesRecall
      • TestHNSWDistanceThreshold_Euclidean
      • TestHNSWDistanceThreshold_Cosine
    • CHANGELOG.md: Unreleased entry for HNSW fix and per‑query options.
  • Backwards compatibility

    • No default behaviour changes. The three‑argument similar_to(attr, k, vector_or_uid) is unchanged.
    • ef and distance_threshold are optional, unsupported metrics safely ignore the threshold.
  • Performance

    • No overhead without options.
    • With ef, bottom‑layer candidate size becomes max(k, ef) (as in HNSW), cost scales accordingly.
    • Threshold filtering is a cheap pass over candidates, squaring Euclidean thresholds avoids extra roots.
  • Rationale and alignment

    • Matches HNSW semantics: ef_search controls exploration/recall, k controls output size.
    • Aligns with Typesense’s per‑query ef and distance_threshold semantics for familiarity.

Checklist

  • Code compiles correctly and linting passes locally
  • For all code changes, an entry added to the CHANGELOG.md describing this PR
  • Tests added for new functionality / regression tests for the bug fix
  • For public APIs/new features, docs PR will be prepared and linked here after initial review

@darkcoderrises
Copy link
Collaborator

@joelamming Thanks a lot for sending your changes here also. I can take a look and convert your changes. Basically we have introduced a new Partioned HNSW which is supposed to be much faster. We also figured out a race condition in the hnsw tree which led to less recall. We were able to improve accuracy significantly even without your changes in the new hnsw.

@darkcoderrises
Copy link
Collaborator

#19
I have created a new PR here. Please feel free to contribute to it. I will try to review it in both the places before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants