Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new interface method SecondaryIndex::NewIterator to enable querying the index #13257

Closed
wants to merge 1 commit into from

Conversation

ltamasi
Copy link
Contributor

@ltamasi ltamasi commented Dec 28, 2024

Summary:
The patch adds a new API NewIterator to SecondaryIndex, which should return an iterator that can be used by applications to query the index. This method takes a ReadOptions structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's Seek API with a search target, and then using Next (and potentially Prev) to iterate through the matching index entries. SeekToFirst, SeekToLast, and SeekForPrev are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class SecondaryIndexIterator.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for FaissIVFIndex in a subsequent patch.)

Differential Revision: D67684777

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684777

@ltamasi ltamasi marked this pull request as draft December 28, 2024 01:13
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value; however, other semantics are also possible. For vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (The new API will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value; however, other semantics are also possible. For vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (The new API will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value; however, other semantics are also possible. For vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (The new API will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684777

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684777

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value; however, other semantics are also possible. For vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (The new API will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 28, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value; however, other semantics are also possible. For vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (The new API will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684777

@ltamasi ltamasi marked this pull request as ready for review December 30, 2024 18:45
@ltamasi ltamasi requested a review from jaykorean December 30, 2024 18:45
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 30, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 31, 2024
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Dec 31, 2024
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
// For vector indices, the search target might be a vector, and the iterator
// might return similar vectors from the index.
virtual std::unique_ptr<Iterator> NewIterator(
const ReadOptions& read_options, Iterator* underlying_it) const = 0;
Copy link
Contributor

@jaykorean jaykorean Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if we are going to guarantee that underlying iterator stays available. Or is it going to be the API caller's responsibility to keep the underlying iter alive as long as this secondary index iterator is being used.

Copy link
Contributor Author

@ltamasi ltamasi Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's the latter but it might actually make sense for the secondary index iterator to take ownership (i.e. for NewIterator to take a unique_ptr<Iterator>&&). Let me change that.

…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67684777

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
…ying the index (facebook#13257)

Summary:

The patch adds a new API `NewIterator` to `SecondaryIndex`, which should return an iterator that can be used by applications to query the index. This method takes a `ReadOptions` structure, which can be used by applications to provide (implementation-specific) query parameters to the index, and an underlying iterator, which should be an iterator over the index's secondary column family, and is expected to be leveraged by the returned iterator to read the actual secondary index entries. (Providing the underlying iterator this way enables querying the index as of a specific point in time for example.)

Querying the index can be performed by calling the returned iterator's `Seek` API with a search target, and then using `Next` (and potentially `Prev`) to iterate through the matching index entries. `SeekToFirst`, `SeekToLast`, and `SeekForPrev` are not expected to be supported by the iterator. The iterator should expose primary keys, that is, the secondary key prefix should be stripped from the index entries.

The exact semantics of the returned iterator depend on the index and are implementation-specific. For simple indices, the search target might be a primary column value, and the iterator might return all primary keys that have the given column value. (This behavior can be achieved using the new class `SecondaryIndexIterator`.) However, other semantics are also possible: for vector indices, the search target might be a vector, and the iterator might return similar vectors from the index. (This will be implemented for `FaissIVFIndex` in a subsequent patch.)

Differential Revision: D67684777
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
// building block for more complex iterators.
class SecondaryIndexIterator : public Iterator {
public:
SecondaryIndexIterator(const SecondaryIndex* index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that ReadOptions are not being passed here yet. I assume that will come in the later PRs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this implementation doesn't use any read options; however, for example, the index iterator for vector similarity search will

@@ -96,6 +99,32 @@ class SecondaryIndex {
const Slice& primary_column_value, const Slice& previous_column_value,
std::optional<std::variant<Slice, std::string>>* secondary_value)
const = 0;

// Create an iterator that can be used by applications to query the index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to add EXPERIMENTAL tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire secondary indexing functionality (including this class) is currently marked as being "under construction" but hopefully will graduate to "experimental" soon ;)

Copy link
Contributor

@jaykorean jaykorean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ltamasi
Copy link
Contributor Author

ltamasi commented Jan 2, 2025

Thanks for the review!

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 3579d32.

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 2, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jan 4, 2025
Summary:
The patch is the read-side counterpart of facebook#13197 . It adds support for K-nearest-neighbor vector similarity searches to `FaissIVFIndex`. There are two main pieces to this:

1) `KNNIterator` is an `Iterator` implementation that is returned by `FaissIVFIndex` upon a call to `NewIterator`. `KNNIterator` treats its `Seek` target as a vector embedding and passes it to FAISS along with the number of neighbors requested `k` as well as the number of probes to use (i.e. the number of inverted lists to check). Applications can then use `Next` (and `Prev`) to iterate over the the vectors in the result set. `KNNIterator` exposes the primary keys associated with the result vectors (see below how this is done), while `value` and `columns` are empty. The iterator also supports a property `rocksdb.faiss.ivf.index.distance` that can be used to retrieve the distance/similarity metric for the current result vector.
2) `IteratorAdapter` takes a RocksDB secondary index iterator (see facebook#13257) and adapts it to the interface required by FAISS (`faiss::InvertedListsIterator`), enabling FAISS to read the inverted lists stored in RocksDB. Since FAISS only supports numerical vector ids of type `faiss::idx_t`, `IteratorAdapter` uses `KNNIterator` to assign ephemeral (per-query) ids to the inverted list items read during iteration, which are later mapped back to the original primary keys by `KNNIterator`.


Differential Revision: D67684898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants