Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rocksandra] Cassandra compaction filter for purge expired columns and rows #2588

Closed
wants to merge 1 commit into from

Conversation

wpc
Copy link
Contributor

@wpc wpc commented Jul 15, 2017

Major changes in this PR:

  • Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
  • Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
  • Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
  • Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.

@facebook-github-bot
Copy link
Contributor

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -82,8 +82,8 @@ class ColumnBase {
virtual int8_t Index() const;
virtual std::size_t Size() const;
virtual void Serialize(std::string* dest) const;

static std::unique_ptr<ColumnBase> Deserialize(const char* src,
virtual bool Collectable() const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GarbageCollectable for clarification maybe?

@@ -112,7 +116,11 @@ void ExpiringColumn::Serialize(std::string* dest) const {
rocksdb::cassandra::Serialize<int32_t>(ttl_, dest);
}

std::unique_ptr<ExpiringColumn> ExpiringColumn::Deserialize(
bool ExpiringColumn::Collectable() const {
return Timestamp() + ttl_ * 1000000 < time(NULL) * 1000000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. not sure if we should use c++ 11 time style here: http://en.cppreference.com/w/cpp/chrono/time_point
  2. I believe we should turn it into a tombstone first, then compaction filter kicks in according to gc_grace_period.

// * and there are no live data for the key in higher level.
// We cannot get info for the second condidtion yet, so we
// will just keep tombstone around for now.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one possible solution is that we could make the gc_grace_seconds as part of compaction filter's constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But we still do not have information on whether there are live data on the higher level sstable files yet. It feels weird passing in gc_grace_seconds but has no place using it.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@wpc
Copy link
Contributor Author

wpc commented Jul 19, 2017

@scv119 I updated the PR:

  • use std::chrono lib for current time in microseconds
  • make the behavior purging or converting to tombstone configurable. And in the java code will set default behavior to converting expired column to tombstone

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

// This source code is licensed under the BSD-style license found in the
// LICENSE file in the root directory of this source tree. An additional grant
// of patent rights can be found in the PATENTS file in the same directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// This source code is also licensed under the GPLv2 license found in the
// COPYING file in the root directory of this source tree.

I believe we also need add this two lines to all source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently updated the license to Apache+GPLv2, and updated the copyright headers in all the files in RocksDB. So please use those headers here and in all the new files.

}
return RowValue(std::move(new_columns), last_modified_time_);

return RowValue(columns_, last_modified_time_);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated returns

namespace rocksdb {
namespace cassandra {

class CassandraCompactionFilter : public CompactionFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might worth add some comments on what's the expected behavior of this filter, and more importantly, the option purge_ttl_on_expiration

@@ -159,6 +166,9 @@ class RowValue {
// otherwise it returns the max timestamp of containing columns.
int64_t LastModifiedTime() const;
void Serialize(std::string* dest) const;
RowValue PurgeTTL(bool* changed) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ttl or TTL ? not sure the convention in Rocksdb but I see both TTL and Ttl is used in this PR.

}
return RowValue(std::move(new_columns), last_modified_time_);

return RowValue(columns_, last_modified_time_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line?

for (auto& column : columns_) {
if(column->Expired()) {
if(shared_ptr<Tombstone> tombstone = column->ToTombstone()) {
new_columns.push_back(tombstone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use move instead of copy here should give us some performance gain.

}

bool ExpiringColumn::Expired() const {
return Timestamp() + ttl_ * 1000000 < NowMicros();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://en.cppreference.com/w/cpp/chrono/duration there are a bunch of utilities could help the conversion.

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

Copy link
Contributor

@scv119 scv119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logics looks good to me, few nits:

  1. GPLv2 licence in the comment header?
  2. consider using std::chrono:: to do the cast between second and microsecond.

I'll leave this to @sagar0 to approve.

@facebook-github-bot
Copy link
Contributor

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@sagar0
Copy link
Contributor

sagar0 commented Jul 20, 2017

Thanks @wpc for contributing!

  1. cassandra_functional_test.cc appears as a new file instead of as a "moved from cassandra_merge_test.cc". Not that it matters a whole lot since there was only 1 commit (of merge operator) prior to this PR, but we will lose that history. Do you think its possible to correct it?
  2. Could you run 'make format' once so that some aspects of coding-style are taken care of.
  3. Lets fix the copyright/license headers in cassandra_functional_test.cc, cassandra_compaction_filter.cc, CassandraCompactionFilter.java and cassandra_compactionfilterjni.cc.

@@ -301,6 +302,51 @@ TEST(RowValueTest, RowWithColumns) {
std::memcmp(dest.c_str(), dest.c_str() + r.Size(), r.Size()) == 0);
}

TEST(RowValueTest, PurgeTTLShouldRemvoeAllColumnExpired) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/RemvoeAllColumn/RemoveAllColumns/

EXPECT_FALSE(changed);
}

TEST(RowValueTest, ExpiresTTLShouldConvertExpiredColumnsToTombstone) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Expires/Expired/

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@wpc updated the pull request - view changes - changes since last import

@wpc
Copy link
Contributor Author

wpc commented Jul 21, 2017

@sagar0 and @scv119 I updated the PR base on the review feedback:

@facebook-github-bot
Copy link
Contributor

@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants