-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate marian-decoder memory usage #957
Comments
This is a "tiny" model. https://share.firefox.dev/3Bqa1T0 This was generated by running: cd 3rd_party/browsermt-marian-dev/build
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
make
model_dir=../../../data/remote-settings/models/decoder-tiny-en-lt-WhOkOOIoQlmo636ixpQ3kw
echo "Translate to Lithuanian" |
valgrind --tool=dhat ./marian-decoder \
--models $model_dir/model.bin \
--vocabs $model_dir/vocab.spm $model_dir/vocab.spm \
--beam-size 1 \
--workspace 16 \
--cpu-threads 1
Model size on disk: 17MB Here is a summary of the "top offenders" for memory.
It seems that the serialized model is retained in memory. The |
Here is the "base" model: https://share.firefox.dev/4gAIE7B |
Here is a better profile with threads disabled: https://share.firefox.dev/3DlI92Z |
Another one where I do a quick exit right before inference. "Bytes at End" has the relevant graph.
|
The For instance in a "base" model, the Wemb are stored as |
Inside of This is backed by bindings to gemmology. It kind of looks like this one won't matter on the Wasm side, as it appears to be these calls to genericMalloc which are the culprites. |
I hacked in a shared ptr for the With a quick exit right before the inference step: 230MB vs 823MB Here is the full execution: https://share.firefox.dev/400QbqX |
The results on the Wasm side are quite different, as it is hitting separate code paths. There are many copies of the model sitting around in memory. The first trick will be to transform the diff --git a/src/common/io_item.h b/src/common/io_item.h
index 24968ec4..e06272c3 100755
--- a/src/common/io_item.h
+++ b/src/common/io_item.h
@@ -3,6 +3,7 @@
#include "common/shape.h"
#include "common/types.h"
+#include <emscripten.h>
#include <string>
namespace marian {
@@ -27,7 +28,25 @@ struct Item {
mapped(other.mapped),
name(other.name),
shape(other.shape),
- type(other.type) {}
+ type(other.type) {
+ printf("!!! Item copy constructor %s (count %ld) %p\n",
+ name.c_str(),
+ other.bytes.use_count(),
+ other.bytes.get());
+
+ // clang-format off
+ EM_ASM({
+ const name = UTF8ToString($0);
+ const size = $1;
+ const pointer = $2;
+
+ ChromeUtils.addProfilerMarker(
+ `Item() "${name}" ${size} 0x${pointer.toString(16)}`,
+ { captureStack: true }
+ );
+ }, name.c_str(), bytes->size(), reinterpret_cast<size_t>(bytes.get()));
+ // clang-format on
+ }
// Copy assignment operator
Item& operator=(const Item& other) {
@@ -38,6 +57,7 @@ struct Item {
name = other.name;
shape = other.shape;
type = other.type;
+ printf("!!! copy assignment operator %s\n", name.c_str());
}
return *this;
}
@@ -52,6 +72,7 @@ struct Item {
type(other.type) {
other.ptr = nullptr;
other.mapped = false;
+ printf("!!! move constructor %s\n", name.c_str());
}
// Move assignment operator
@@ -66,6 +87,8 @@ struct Item {
other.ptr = nullptr;
other.mapped = false;
+
+ printf("!!! Move assignment operator %s\n", name.c_str());
}
return *this;
} The A quicker more immediate solve is to at least release the backing data. There are now two copies of the model. The first is the model that is passed into the engine from JavaScript. The second is the items owned by ScorerWrapper. So if we add: class ScorerWrapper {
void clearItems() override { items_.clear(); }
} And then clear both copies:
While this could be a bit unsafe outside of Wasm by just clearing memory, it works here for how we use our current implementation. Perhaps we could add some more runtime checks for accessing the memory. With this current change I've gotten the "tiny" model's memory from 300mb to 250mb, the "base" model from 500mb to 371mb. |
We're hitting some pretty large memory usage in Firefox when translation, ~300MB RSS for a "tiny" models, and ~500MB RSS for "base" models. We should do some memory analysis here on the native build.
The text was updated successfully, but these errors were encountered: