-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory-mapped utilility module #189
Add memory-mapped utilility module #189
Conversation
82bd839
to
54ae1b6
Compare
54ae1b6
to
2dcefbe
Compare
pecos/core/utils/mmap_util.hpp
Outdated
* n_bytes_to_pad (uint64_t): Number of bytes to pad before saving the new block. | ||
*/ | ||
uint64_t aligned_append(const uint64_t size) { | ||
const uint64_t align_bytes = 16; // All blocks are aligned to multiple of the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should align_bytes
be declared here or in _Signature
pecos/core/utils/mmap_util.hpp
Outdated
* n_elements (uint64_t): Number of elements to load for the array. | ||
*/ | ||
template<class T, class TT=T, if_simple_serializable<TT> = true> | ||
T * fget_multiple(uint64_t n_elements) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have fget_next
when user don't know n_elements
beforehand?
pecos/core/utils/mmap_util.hpp
Outdated
_mode = _Mode::WRITEONLY; | ||
} | ||
else { | ||
throw std::runtime_error("Unrecogonized mode. Should be either 'r' or 'w'."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed rp
pecos/core/utils/mmap_util.hpp
Outdated
/* Load one element */ | ||
template<class T> | ||
T fget_one() { // Call fget_multiple | ||
return * fget_multiple<T>(1u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no implementation of fget_multiple() when T is not simple_serializable.
pecos/core/utils/mmap_util.hpp
Outdated
auto n_bytes_to_pad = _metadata.aligned_append(sizeof(T) * n_elements); | ||
|
||
// Pad previous block end with dummy bytes | ||
const char dummy = '\0'; | ||
for(auto i = 0u; i < n_bytes_to_pad; i++){ | ||
file_util::fput_one(dummy, _fp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this logic be implemented into alighed_append?
pecos/core/utils/mmap_util.hpp
Outdated
/* Functions to match std::vector interface */ | ||
uint64_t size() const { return _size; } | ||
|
||
const T* data() const { return _data; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need both
const T* data() const { return _data; }
T* data() { return data; }
?
pecos/core/utils/mmap_util.hpp
Outdated
mmap_f.fput_multiple<T>(_data, _size); | ||
} | ||
|
||
void load_mmap(MmapFile & mmap_f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar name as _load_mmap
but different concept, consider change to from_mmap
? Similarly change save_mmap
to to_mmap
?
pecos/core/utils/mmap_util.hpp
Outdated
} | ||
|
||
void load_mmap(MmapFile & mmap_f) { | ||
clear(); // Clean any previous storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to always clear or throw an error if non empty?
Addressed all comments in newest commit. |
pecos/core/utils/mmap_util.hpp
Outdated
|
||
bool empty() const { return static_cast<bool> (size_ == 0); } | ||
|
||
void resize(uint64_t size, const T& value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a default value in the signature so it can handle both cases.
void resize(uint64_t size, const T& value=T() )
pecos/core/utils/mmap_util.hpp
Outdated
template<class T, class TT = T, details_::if_simple_serializable<TT> = true> | ||
class MmapableVector { | ||
public: | ||
MmapableVector() {} // Dummy constructor for empty case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this class will be used as a drop-in replacement of std::vector with limited operations. Do we want to consider to follow the rule of five pattern for this class? https://en.cppreference.com/w/cpp/language/rule_of_three
pecos/core/utils/mmap_util.hpp
Outdated
/* | ||
* Module Private Class to store signature and metadata of all data blocks in mainbody for a memory-mapped file. | ||
*/ | ||
class MmapMetadata_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to have a trailing _ for private members (variables, or methods) but not for the class definition given that all the classes are already wrapped in a privately hinted details_ namespace.
pecos/core/utils/mmap_util.hpp
Outdated
* Return: | ||
* n_bytes_to_pad (uint64_t): Number of bytes to pad before saving the new block. | ||
*/ | ||
uint64_t get_n_bytes_to_pad(const uint64_t size) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not dependency on the input argument "size" in the entire implementation. Do we need this argument at all? if not, we can drop it and potentially rename the method to something like get_num_of_padding_to_align(const uint64_t aligned_bytes=N_ALIGN_BYTES_)
/* Save metadata to end of file after all data blocks have been saveed. | ||
* Also, file pointer is closed at the end. | ||
*/ | ||
void finalize_() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't plan to reveal this method and only use it once in the destructor, I would suggest to move the content into the destructor itself.
} | ||
|
||
/* Free memory-mapped region */ | ||
void free_mmap_() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't plan to reveal this method and only use it once in the destructor, I would suggest to move the content into the destructor itself.
pecos/core/utils/mmap_util.hpp
Outdated
} | ||
else if (mode_ == Mode_::READONLY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's merge these two lines into one } else if {
, which is used as a convection in other C++ codes in the library.
ab170b6
to
d4de89d
Compare
Addressed all comments in newest commit. |
pecos/core/utils/mmap_util.hpp
Outdated
uint8_t loaded_magic[sizeof(MAGIC)]; | ||
file_util::fget_multiple<uint8_t>(&loaded_magic[0], sizeof(MAGIC), fp); | ||
if(std::memcmp(&MAGIC[0], &loaded_magic[0], sizeof(MAGIC)) != 0) { | ||
throw std::runtime_error("File is not a valid PECOS MMAP file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing period of the sentence, it's super minor though.
* Once loaded as mmap view, it cannot go back to std::vector case unless clear() is called. | ||
*/ | ||
template<class T, class TT = T, details_::if_simple_serializable<TT> = true> | ||
class MmapableVector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to also implement push_back
and emplace_back
at the moment. The vectorizer and sparse matrix currently have struct methods with vectors using them.
d4de89d
to
b617af7
Compare
Addressed all comments in newest update. |
Issue #, if available:
N/A
Description of changes:
Add memory-mapped utilility module.
User could test with below code: Copy it into a file
test_mmap_util.cpp
placed atpecos/core/util/
, and run:Output:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.