Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuckoohashmap taking high memory #136

Open
naidud2020 opened this issue Aug 12, 2021 · 4 comments
Open

cuckoohashmap taking high memory #136

naidud2020 opened this issue Aug 12, 2021 · 4 comments

Comments

@naidud2020
Copy link

naidud2020 commented Aug 12, 2021

Tried with cuckoohashmap and unordered_map.
for same data, cuckoohashmap taken morethan 64GB on RAM and crashed. unordered_map finished with 1.5GB on RAM.
I would like to give preference to libcuckoo hash map(concurrent). Attaching sample c++ code. Do let me know whether libcuckoo supports with such a small memory footprint like unordered_map.

nested_table.cc.txt

code:

/* We demonstrate how to nest hash tables within one another, to store

  • unstructured data, kind of like JSON. There's still the limitation that it's
  • statically typed. */

#include
#include
#include
#include
#include <unordered_map>

#include <libcuckoo/cuckoohash_map.hh>

void test_with_cuckuoo_hash()
{
typedef libcuckoo::cuckoohash_map<std::string, std::string> InnerTable;
typedef libcuckoo::cuckoohash_map<std::string, std::unique_ptr> OuterTable;

OuterTable tbl;
tbl.reserve(5000000); //reserved

for(int i = 1000000; i < 6000000; ++ i) {
std::string outer_key{std::to_string(i)};
std::string inner_key{std::to_string(i)};
std::string inner_val{std::to_string(i)};

  tbl.insert(outer_key, std::unique_ptr<InnerTable>(new InnerTable));
  tbl.update_fn(outer_key, [&](std::unique_ptr<InnerTable> &innerTbl) {
      innerTbl->reserve(4); //reserved
      innerTbl->insert(inner_key, inner_val);
  });

}

std::cout << "count: " << tbl.size() << std::endl;

}

void test_with_unordered_map()
{
typedef std::unordered_map<std::string, std::string> InnerTable;
typedef std::unordered_map<std::string, std::unique_ptr> OuterTable;

OuterTable tbl;

for(int i = 1000000; i < 6000000; ++ i) {
std::string outer_key{std::to_string(i)};
std::string inner_key{std::to_string(i)};
std::string inner_val{std::to_string(i)};

  std::unique_ptr<InnerTable> inner_tbl = std::make_unique<InnerTable>();
  inner_tbl->insert({inner_key, inner_val});
  tbl.insert({outer_key, std::move(inner_tbl)});

}

std::cout << "count: " << tbl.size() << std::endl;

}

int main() {
//test_with_cuckuoo_hash(); //CONSUMING MORE THAN 64 GB, STILL COULDNT COMPLETE
test_with_unordered_map(); //CONSUMING 1.5 GB
return 0;
}

@manugoyal
Copy link
Contributor

Hi @naidud2020, it seems the problem here is that the default constructor for a cuckoohash table will allocate buckets for 2^18 elements . If I change your code to construct the InnerTable as new InnerTable(1), the memory usage on my machine for test_with_cuckuoo_hash goes down to about 5.5G.

The thing with the cuckoohash table is that very small tables are not particularly memory efficient, because there are several auxiliary data structures such as a mutex array which must be allocated for each table. So if you don't need thread safety for the inner table, it might be a good idea to use cuckoohash_map for the OuterTable and unordered_map for the inner table. On my machine, this results in ~900MB memory usage for your example.

Hope this helps!

@naidud2020
Copy link
Author

Hi @manugoyal Thanks for your response. need thread safe for outer and inner table.

During my experiments, I found that there is a good performance with integer as a key for outer and inner tables.

Is it recommended to use libcuckoo hash map to store "20 Million Keys in outer table and each outer table has ~5 Inner keys". Will it support?

@manugoyal
Copy link
Contributor

Yes I expect that would work fine. Another option for thread-safety on the inner table is to store something like a pair<std::mutex, std::unordered_map>. So a coarse-grained level of thread-safety for the inner table, and a fine-grained level for the outer table. Also if you have a really small number of keys in the inner table, it might be faster to use a simple data structure like std::vector<std::pair<key value>>, and just linearly search. But probably worth trying out several options.

@naidud2020
Copy link
Author

naidud2020 commented Aug 15, 2021

I would like to understand the usecases for libcuckoo hash map. can't have many mutexes as per our requirement. have to store huge number of <key, value>s. tried below approaches to solve this:

  1. <string, <string, value> > : libcuckoo+libcuckoo, libcuckoo+unordered_map -> smaller inner tables, may not be efficient. more mutexes.
  2. mapping keys to unique numbers, <int, <int, value> > : same, smaller inner tables here. more mutexes.

I will try to eliminate innertable if needed, based on libcuckoo usecases.

please share the usecases for "high performance" and "low memory footprint". will change according to libcuckoo usecases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants