Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new interface method SecondaryIndex::NewIterator to enable querying the index #13257

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions include/rocksdb/utilities/secondary_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@

#pragma once

#include <memory>
#include <optional>
#include <string>
#include <variant>

#include "rocksdb/iterator.h"
#include "rocksdb/options.h"
#include "rocksdb/rocksdb_namespace.h"
#include "rocksdb/slice.h"
#include "rocksdb/status.h"
Expand Down Expand Up @@ -96,6 +99,32 @@ class SecondaryIndex {
const Slice& primary_column_value, const Slice& previous_column_value,
std::optional<std::variant<Slice, std::string>>* secondary_value)
const = 0;

// Create an iterator that can be used by applications to query the index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to add EXPERIMENTAL tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire secondary indexing functionality (including this class) is currently marked as being "under construction" but hopefully will graduate to "experimental" soon ;)

// This method takes a ReadOptions structure, which can be used by
// applications to provide (implementation-specific) query parameters to the
// index as well as an underlying iterator over the index's secondary column
// family, which the returned iterator is expected to take ownership of and
// use to read the actual secondary index entries. (Providing the underlying
// iterator this way enables querying the index as of a specific point in time
// for example.)
//
// Querying the index can be performed by calling the returned iterator's
// Seek API with a search target, and then using Next (and potentially
// Prev) to iterate through the matching index entries. SeekToFirst,
// SeekToLast, and SeekForPrev are not expected to be supported by the
// iterator. The iterator should expose primary keys, that is, the secondary
// key prefix should be stripped from the index entries.
//
// The exact semantics of the returned iterator depend on the index and are
// implementation-specific. For simple indices, the search target might be a
// primary column value, and the iterator might return all primary keys that
// have the given column value; however, other semantics are also possible.
// For vector indices, the search target might be a vector, and the iterator
// might return similar vectors from the index.
virtual std::unique_ptr<Iterator> NewIterator(
const ReadOptions& read_options,
std::unique_ptr<Iterator>&& underlying_it) const = 0;
};

} // namespace ROCKSDB_NAMESPACE
7 changes: 7 additions & 0 deletions utilities/secondary_index/faiss_ivf_index.cc
Original file line number Diff line number Diff line change
Expand Up @@ -210,4 +210,11 @@ Status FaissIVFIndex::GetSecondaryValue(
return Status::OK();
}

std::unique_ptr<Iterator> FaissIVFIndex::NewIterator(
const ReadOptions& /* read_options */,
std::unique_ptr<Iterator>&& /* underlying_it */) const {
// TODO: implement this
return std::unique_ptr<Iterator>(NewErrorIterator(Status::NotSupported()));
}

} // namespace ROCKSDB_NAMESPACE
4 changes: 4 additions & 0 deletions utilities/secondary_index/faiss_ivf_index.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ class FaissIVFIndex : public SecondaryIndex {
std::optional<std::variant<Slice, std::string>>*
secondary_value) const override;

std::unique_ptr<Iterator> NewIterator(
const ReadOptions& read_options,
std::unique_ptr<Iterator>&& underlying_it) const override;

private:
class Adapter;

Expand Down
137 changes: 137 additions & 0 deletions utilities/secondary_index/secondary_index_iterator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
// Copyright (c) Meta Platforms, Inc. and affiliates.
//
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).

#pragma once

#include <cassert>
#include <string>

#include "rocksdb/iterator.h"
#include "rocksdb/status.h"
#include "rocksdb/utilities/secondary_index.h"
#include "util/overload.h"

namespace ROCKSDB_NAMESPACE {

// A simple iterator that can be used to query a secondary index (that is, find
// the primary keys for a given search target). Can be used as-is or as a
// building block for more complex iterators.
class SecondaryIndexIterator : public Iterator {
public:
SecondaryIndexIterator(const SecondaryIndex* index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that ReadOptions are not being passed here yet. I assume that will come in the later PRs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this implementation doesn't use any read options; however, for example, the index iterator for vector similarity search will

std::unique_ptr<Iterator>&& underlying_it)
: index_(index), underlying_it_(std::move(underlying_it)) {
assert(index_);
assert(underlying_it_);
}

bool Valid() const override {
return status_.ok() && underlying_it_->Valid() &&
underlying_it_->key().starts_with(prefix_);
}

void SeekToFirst() override {
status_ = Status::NotSupported(
"SeekToFirst is not supported for secondary index iterators");
}

void SeekToLast() override {
status_ = Status::NotSupported(
"SeekToLast is not supported for secondary index iterators");
}

void Seek(const Slice& target) override {
status_ = Status::OK();

std::variant<Slice, std::string> prefix;

const Status s = index_->GetSecondaryKeyPrefix(target, &prefix);
if (!s.ok()) {
status_ = s;
return;
}

prefix_ = std::visit(
overload{
[](const Slice& value) -> std::string { return value.ToString(); },
[](const std::string& value) -> std::string { return value; }},
prefix);

// FIXME: this works for BytewiseComparator but not for all comparators in
// general
underlying_it_->Seek(prefix_);
}

void SeekForPrev(const Slice& /* target */) override {
status_ = Status::NotSupported(
"SeekForPrev is not supported for secondary index iterators");
}

void Next() override {
assert(Valid());

underlying_it_->Next();
}

void Prev() override {
assert(Valid());

underlying_it_->Prev();
}

bool PrepareValue() override {
assert(Valid());

return underlying_it_->PrepareValue();
}

Status status() const override {
if (!status_.ok()) {
return status_;
}

return underlying_it_->status();
}

Slice key() const override {
assert(Valid());

Slice key = underlying_it_->key();
key.remove_prefix(prefix_.size());

return key;
}

Slice value() const override {
assert(Valid());

return underlying_it_->value();
}

const WideColumns& columns() const override {
assert(Valid());

return underlying_it_->columns();
}

Slice timestamp() const override {
assert(Valid());

return underlying_it_->timestamp();
}

Status GetProperty(std::string prop_name, std::string* prop) override {
return underlying_it_->GetProperty(std::move(prop_name), prop);
}

private:
const SecondaryIndex* index_;
std::unique_ptr<Iterator> underlying_it_;
Status status_;
std::string prefix_;
};

} // namespace ROCKSDB_NAMESPACE
103 changes: 102 additions & 1 deletion utilities/transactions/transaction_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#include "util/random.h"
#include "util/string_util.h"
#include "utilities/merge_operators.h"
#include "utilities/merge_operators/string_append/stringappend.h"
#include "utilities/secondary_index/secondary_index_iterator.h"
#include "utilities/transactions/pessimistic_transaction_db.h"

namespace ROCKSDB_NAMESPACE {
Expand Down Expand Up @@ -8083,6 +8083,13 @@ TEST_P(TransactionTest, SecondaryIndex) {
return Status::OK();
}

std::unique_ptr<Iterator> NewIterator(
const ReadOptions& /* read_options */,
std::unique_ptr<Iterator>&& underlying_it) const override {
return std::make_unique<SecondaryIndexIterator>(this,
std::move(underlying_it));
}

private:
ColumnFamilyHandle* primary_cfh_{};
ColumnFamilyHandle* secondary_cfh_{};
Expand Down Expand Up @@ -8181,6 +8188,7 @@ TEST_P(TransactionTest, SecondaryIndex) {
}

{
// Read the raw secondary index entries from CF2
std::unique_ptr<Iterator> it(db->NewIterator(ReadOptions(), cfh2));

it->SeekToFirst();
Expand All @@ -8198,6 +8206,58 @@ TEST_P(TransactionTest, SecondaryIndex) {
ASSERT_OK(it->status());
}

{
// Query the secondary index
std::unique_ptr<Iterator> underlying_it(
db->NewIterator(ReadOptions(), cfh2));
std::unique_ptr<Iterator> it(
index->NewIterator(ReadOptions(), std::move(underlying_it)));

it->SeekToFirst();
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->SeekToLast();
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->SeekForPrev("box");
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->Seek("box"); // last character used for indexing: x
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key3");
ASSERT_EQ(it->value(), "zab");

it->Next();
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key4");
ASSERT_EQ(it->value(), "xuuq");

it->Prev();
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key3");
ASSERT_EQ(it->value(), "zab");

it->Next();
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key4");
ASSERT_EQ(it->value(), "xuuq");

it->Next();
ASSERT_FALSE(it->Valid());
ASSERT_OK(it->status());

it->Seek("toy"); // last character used for indexing: y
ASSERT_FALSE(it->Valid());
ASSERT_OK(it->status());
}

// Make some updates to the key-values indexed above through the database
// interface (i.e. using implicit transactions)

Expand Down Expand Up @@ -8256,6 +8316,7 @@ TEST_P(TransactionTest, SecondaryIndex) {
}

{
// Read the raw secondary index entries from CF2
std::unique_ptr<Iterator> it(db->NewIterator(ReadOptions(), cfh2));

it->SeekToFirst();
Expand All @@ -8272,6 +8333,46 @@ TEST_P(TransactionTest, SecondaryIndex) {
ASSERT_FALSE(it->Valid());
ASSERT_OK(it->status());
}

{
// Query the secondary index
std::unique_ptr<Iterator> underlying_it(
db->NewIterator(ReadOptions(), cfh2));
std::unique_ptr<Iterator> it(
index->NewIterator(ReadOptions(), std::move(underlying_it)));

it->SeekToFirst();
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->SeekToLast();
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->SeekForPrev("bot");
ASSERT_FALSE(it->Valid());
ASSERT_TRUE(it->status().IsNotSupported());

it->Seek("bot"); // last character used for indexing: t
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key1");
ASSERT_EQ(it->value(), "tluarg");

it->Next();
ASSERT_FALSE(it->Valid());
ASSERT_OK(it->status());

it->Seek("toy"); // last character used for indexing: y
ASSERT_TRUE(it->Valid());
ASSERT_OK(it->status());
ASSERT_EQ(it->key(), "key3");
ASSERT_EQ(it->value(), "ylprag");

it->Next();
ASSERT_FALSE(it->Valid());
ASSERT_OK(it->status());
}
}

TEST_F(TransactionDBTest, CollapseKey) {
Expand Down
Loading