Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purging (erasing) random records #80

Open
rayrapetyan opened this issue May 19, 2017 · 4 comments
Open

Purging (erasing) random records #80

rayrapetyan opened this issue May 19, 2017 · 4 comments

Comments

@rayrapetyan
Copy link

What's the most efficient way of purging some random values from libcuckoo?

E.g. we have implemented a map where each value has a TTL field assigned. From time to time (e.g. after every N-th insert) we need to randomly pick and purge some expired records. Here are the two ways I can think so far:

  1. Using locked_table. We can use iterator obtained after inserting into locked_table, it will be random by nature, so we can iterate forward and try to purge some records. This method is super slow because lock_table() seems creates a full copy of underlying table. The speed of pure lock\unlock is just few hundred operations per second.
  2. Using a [lock-free] list of all keys. Having a list of all existing keys, we can pick few of them and try to randomly erase expired records from the map. Here we need to store a key on every insert (in case of a string-type key it seems much worse than using a classic mutex lock).

Any thoughts?

The second question is: can we safely call functions like insert\update\erase from within update_fn\erase_fn\insert_fn lambdas code?

Thanks!

@rayrapetyan
Copy link
Author

rayrapetyan commented May 20, 2017

I've ended up implementing:

template<typename F>
    uint erase_random_fn(uint num_elements, F fn) {
      static std::random_device rnd_dev;
      static std::mt19937_64 rnd_gen(rnd_dev());

      size_type buckets_size = buckets_.size();
      if (capacity() < LIBCUCKOO_DEFAULT_SIZE)
        return 0;

      std::uniform_int_distribution<uint64_t> dist_buckets(0, buckets_size - 1);
      const size_type start_bucket = dist_buckets(rnd_gen);

      const size_type max_buckets = 2;
      uint del_elements = 0;
      const size_type max_bucket_num = std::min(buckets_size, start_bucket + max_buckets);
      for (size_type cur_bucket = start_bucket; del_elements < num_elements && cur_bucket < max_bucket_num; ++cur_bucket) {
        const auto ob = snapshot_and_lock_one<normal_mode>(cur_bucket);
        bucket &b = buckets_[cur_bucket];
        for (size_type slot = 0; slot < slot_per_bucket() && del_elements < num_elements; ++slot) {
          if (b.occupied(slot) && fn(buckets_[cur_bucket].mapped(slot))) {
            del_from_bucket(cur_bucket, slot);
            ++del_elements;
          }
        }
      }
      return del_elements;
    }

inside cuckoohash_map.hh. As it resolves a very specific type of issue, I would not commit anything.

@rayrapetyan
Copy link
Author

Would you accept this function into master branch?
Or maybe even general core elements TTL support?
Thanks.

@Ywinh
Copy link

Ywinh commented Jul 7, 2024

Is this plan still feasible today, as there is no 'snapshot_and_lock_one' now, only 'snapshot_and_lock_two'
I'm in the need of picking random key, but i don't understand the source code, i really doubt it

@manugoyal
Copy link
Contributor

Hi there. We do not plan on supporting a "pick random key" sort-of functionality as part of the table. For cache-eviction like problems, I have often seen people track additional metadata outside of the main data structure. For example, very primitively:

class MyCache {
 public:
  const Data* find(int key);
  const Data* insert(int key, std::unique_ptr<Data> val);
  void erase(int key);
 private:
  void evict();

  libcuckoo::cuckoohash_map<int, std::unique_ptr<Data>> tbl_;
  // Manipulating `tbl_items_` would require a coarse lock on the whole object,
  // but it's possible to use a data structure with finer-grained
  // concurrency if desired.
  std::vector<std::pair<int, const Data*>> tbl_items_;

While this approach does incur the overhead of maintaining tbl_items_ and synchronizing its state with tbl_, the native cuckoohash_map implementation would have to incur similar overhead to support such functionality anyways, and it's unlikely to be doable in a way that's optimal for different people's use-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants