-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fixed length str2int / asin str2int memory-mappable hashmap #295
Conversation
39135ac
to
cc1fd3f
Compare
from pecos.utils.mmap_hashmap_util import MmapHashmap, MmapHashmapBatchGetter | ||
|
||
map_dir = tmpdir.join("fixed_len_10_str2int").realpath().strpath | ||
kv_dict = {"aaaaaaaaaa".encode("utf-8"): 2, "bbbbbbbbbb".encode("utf-8"): 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create the strings like "a" * 10 so that it is more clear that these keys are length-10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will update for clarity.
max_batch_size - len(kv_dict) | ||
) # Non-exist key | ||
vs = list(kv_dict.values()) + [10] * (max_batch_size - len(kv_dict)) | ||
assert r_map_batch_getter.get(ks, 10).tolist() == vs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you testing increasing max_batch_size
inside class? Then you should check it indeed got increased.
pecos/core/utils/mmap_hashmap.hpp
Outdated
|
||
static constexpr std::size_t fixed_str_len = 10; | ||
|
||
struct StrView { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should rename this struct into something like FixLen10Str
. It's not a "view" anymore, but actually holds data.
pecos/core/utils/mmap_hashmap.hpp
Outdated
throw std::runtime_error("ASIN string length is not 10."); | ||
} | ||
|
||
char key_arr[fixed_str_len]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary to create this array, passing the pointer is sufficient.
pecos/core/utils/mmap_hashmap.hpp
Outdated
} | ||
|
||
char key_arr[fixed_str_len]; | ||
std::memcpy(key_arr, key_string.data(), key_string.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remove this line.
pecos/core/utils/mmap_hashmap.hpp
Outdated
// Emplace back std::pair<StrView, uint64_t> | ||
auto eb_val = store_.emplace_back( | ||
std::piecewise_construct, | ||
std::forward_as_tuple(key_arr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this line into:
std::forward_as_tuple(key_string.data()),
StrView(const char* input_str = nullptr) { | ||
if (input_str) { | ||
std::memcpy(str, input_str, fixed_str_len); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add an else
here to guard against illegal init which we definitely should realize for a bug in the code:
else {
throw std::runtime_error("Illegal initialization of FixLen10Str with nullptr.");
}
Ideally, should limit scope of this constructor to only take input param std::string_view
. However our template of mmap vector limits data to be plain and string_view
as param doesn't satisfy. Using raw ptr here as a compromise.
256eaa6
to
1aedb47
Compare
Issue #, if available:
N/A
Description of changes:
Added fixed length str2int and fixed length 10 str2int memory-mapped utility for ankerl hashmap.
Usage (Python-binding): Please refer to the added unittests.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.