Adding option to pass user data to allocator functions #1765

nchamzn · 2022-11-28T21:43:40Z

Adds an option to pass user data to the allocator functions. It's common to do this so that if I am embedding Wamr in to my application then I can pass a struct as user data and access that struct from the allocator which gives me the ability to do things such as track allocation statistics within my allocator.

Notes

I added this as a new option so I didn't break the existing API for users.
I moved the definitions of MemAllocOption and mem_alloc_type_t to a new header, I don't know if there was a reason they were duplicated.

Example usage:

Engine::Engine() : _total_allocations(0)
{
    RuntimeInitArgs args;
    memset(&args, 0, sizeof(RuntimeInitArgs));

    args.mem_alloc_type = Alloc_With_Allocator_With_Data;
    args.mem_alloc_option.allocator_with_data.data = reinterpret_cast<void*>(this);
    args.mem_alloc_option.allocator_with_data.malloc_func = reinterpret_cast<void*>(engine_malloc);
    args.mem_alloc_option.allocator_with_data.realloc_func = reinterpret_cast<void*>(engine_realloc);
    args.mem_alloc_option.allocator_with_data.free_func = reinterpret_cast<void*>(engine_free);

    // ...
    auto engine = wasm_engine_new_with_args(args.mem_alloc_type, &args.mem_alloc_option);

    // ...
}

void* Engine::malloc(void* data, size_t size)
{
    auto that = reinterpret_cast<Engine*>(data);
    that->_total_allocations++;
    // Now allocate some memory
}

void Engine::free(void* data, void* ptr)
{
    auto that = reinterpret_cast<Engine*>(data);
    that->_total_allocations--;
    // Now free some memory
}

loganek · 2022-11-28T21:48:39Z

core/iwasm/include/mem_alloc_export.h

+    Alloc_With_Allocator,
+    /* user allocator mode, allocate memory from user defined
+       malloc function with user data */
+    Alloc_With_Allocator_With_Data,


Should that be moved to the bottom as a last option? Otherwise this change breaks ABI backward compatibility.

Good point, updated

loganek · 2022-11-28T21:54:49Z

core/iwasm/include/mem_alloc_export.h

+        void *realloc_func;
+        void *free_func;
+        void *data;
+    } allocator_with_data;


I'm curious if you also explored alternative approach where we have a single struct (allocator) with void* data inside ifdef? It might make the code less readable, but that possibly would simplify the implementation? That's just an idea though, not sure if you already thought about it.

When you say inside ifdef you mean add a build option?

I did consider just putting data in the allocator struct and then only using in the Alloc_With_Allocator_With_Data case but thought it might be clearer and more typesafe to do it like this - however I don't mind changing it.

Yes, I meant the build option, something like:

typedef union MemAllocOption { struct { void *heap_buf; uint32_t heap_size; } pool; struct { void *malloc_func; void *realloc_func; void *free_func; #ifdef MEM_ALLOC_WITH_USER_DATA void *user_data; #endif } allocator; } MemAllocOption;

Had a look at this - Wamr is not using configure_file anywhere for definitions so adding an #ifdef in a header like this is a bit cumbersome - I can do it, but the host will need to also add the definition when including the header

Good point, didn't realize that.

nchamzn · 2022-11-28T22:05:54Z

core/iwasm/common/wasm_memory.c

 static void (*free_func)(void *ptr) = NULL;

+static void *allocator_data = NULL;
+static void *(*malloc_func_with_data)(void *data, unsigned int size) = NULL;


Also considered here reusing the existing variables but erasing the type and casting when the functions are called, but went this way - also happy to change that

wenyongh · 2022-11-29T02:38:22Z

@nchamzn, seems that in you host application, you pass the malloc_func/free_func callback (mode Alloc_With_Allocator) to runtime init, and want to track the allocation statistics, right? Not sure why not wrap the malloc_func/free_func, do the statistics in the wrapper APIs, and pass the wrapper APIs to runtime? For example:

static data;
void *my_malloc(uint32 size) {
   data->alloc_count++;
   return engine_malloc(size);
}

void my_free(void *ptr) {
    data->free_count++;
    engine_free(ptr);
}

    RuntimeInitArgs args;
    memset(&args, 0, sizeof(RuntimeInitArgs));

    args.mem_alloc_type = Alloc_With_Allocator;
    args.mem_alloc_option.allocator.malloc_func = reinterpret_cast<void*>(my_malloc);
    args.mem_alloc_option.allocator.realloc_func = reinterpret_cast<void*>(my_realloc);
    args.mem_alloc_option.allocator.free_func = reinterpret_cast<void*>(my_free);

    // ...
    auto engine = wasm_engine_new_with_args(args.mem_alloc_type, &args.mem_alloc_option);

Had better not extend the runtime option, if it is unnecessary.

loganek · 2022-11-29T07:26:04Z

@wenyongh I agree better to not modify the API, and actually in its current design, where the engine itself and allocation functions are singletons, there's no much benefit of having user data, because it can be a global variable instead (like you suggested in your comment).

Having said that, I wonder if we should start working towards having a concept of contexts instead of singletons/global objects so it's possible to embed multiple VMs in a single process? For example:

CPython follows similar approach and only allows having a single interpreter instance per process (although there's a lot of differences between that and WAMR, and CPython)
LUA has lua_State which holds all the information about the execution environment (including allocation methods) so we can have multiple scripts running independently in a single process

I presume moving away from global objects would require much larger refactoring and probably needs more through analysis; also, I'm aware I don't know have all the data behind decisions that were made in WAMR a long time ago, and there might be good reasons to keep it as it is. I've added this topic to the next TSC meeting agenda to discuss this topic with the others and gather their feedback.

nchamzn · 2022-11-29T09:32:53Z

The suggestion is how we have implemented this at the moment but I see it as unnecessarily complex and means we have to compromise on design/encapsulation of the embedding application - not passing user data forces the embedding application to use a singleton/static even though it shouldn't be required.

I already have a singleton Engine class and the allocator should have the same lifetime, as it does in Wamr (See: wasm_runtime_memory_init/wasm_runtime_memory_destroy), but to ensure my own allocator has the same lifetime I need to add functions to mutate this singleton from my Engine constuctor/destructor whereas in an ideal world it would just be a member of the Engine class:

class Engine
{
  Engine& get_instance();

  AllocStats get_alloc_stats();
private:
  Allocator _allocator;
};

For what it's worth, this pattern is used by all other Vms I've embedded before:

wenyongh · 2022-11-29T10:19:46Z

@wenyongh I agree better to not modify the API, and actually in its current design, where the engine itself and allocation functions are singletons, there's no much benefit of having user data, because it can be a global variable instead (like you suggested in your comment).

Having said that, I wonder if we should start working towards having a concept of contexts instead of singletons/global objects so it's possible to embed multiple VMs in a single process? For example:

CPython follows similar approach and only allows having a single interpreter instance per process (although there's a lot of differences between that and WAMR, and CPython)

LUA has lua_State which holds all the information about the execution environment (including allocation methods) so we can have multiple scripts running independently in a single process

I presume moving away from global objects would require much larger refactoring and probably needs more through analysis; also, I'm aware I don't know have all the data behind decisions that were made in WAMR a long time ago, and there might be good reasons to keep it as it is. I've added this topic to the next TSC meeting agenda to discuss this topic with the others and gather their feedback.

@loganek Per my understanding, your mainly idea is to create runtime/vm context multiple times to eliminate the singletons/global objects in wasm_memory.c and some other places, and for each wasm_runtime_init/full_init, we return a runtime context instead, right? By this way we can custom the individual behavior for each runtime context. My concern is that this would greatly change the runtime APIs and cause incompatibility with old versions, for example, for wasm_runtime_load/wasm_runtime_instantiate and some other APIs, we should add runtime context as a new argument. And not sure whether it is necessary? Since we have already supported created multiple wasm module/module_inst in a thread.

Or do we just want to support using different memory allocator for a module/module_inst? That seems easier but should also require lots of changes.

wenyongh · 2022-11-29T10:34:21Z

core/iwasm/include/mem_alloc_export.h

+#ifdef MEM_ALLOC_WITH_USER_DATA
+        void *user_data;
+#endif


Should not add the macro control in the exported header files, since the host embedder may use the header file directly, link with the built vm library file and don't add macro to build the host application. Normally we use cmake variables or C macros to build the library, but we don't add extra macro control in host application for the wasm runtime exported header file.

Prefer not to add the macro control, just add the user_data field and add comments, and set it for wasm_runtime_init if the runtime is built with user data enabled. And ignore it if runtime is built without user data enabled.

Refer to: https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/include/wasm_export.h#L140-L148
For these fields, we don't add macro to control them either.

Yeah, I mentioned this here. If you are happy with this being always defined then that is much easier. Thanks for the link, that helps

Have you considered using CMake configure_file by the way? Then the header could include config.h and this would be configured by cmake to have the correct definitions defined depending on the build configuration

core/iwasm/include/mem_alloc_export.h

wenyongh · 2022-11-29T10:38:27Z

core/iwasm/common/wasm_c_api.c

            opts->allocator.free_func;
        init_args.mem_alloc_option.allocator.realloc_func =
            opts->allocator.realloc_func;
+#ifdef MEM_ALLOC_WITH_USER_DATA


Normally we use #if MEM_ALLOC_WITH_USER_DATA != 0 and #if MEM_ALLOC_WITH_USER_DATA == 0 instead of #ifdef MEM_ALLOC_WITH_USER_DATA and #ifndef MEM_ALLOC_WITH_USER_DATA.

And add a macro definition in core/config.h:

#ifndef MEM_ALLOC_WITH_USER_DATA #define MEM_ALLOC_WITH_USER_DATA 0 #endif

Just out of curiosity, is there any reason (other than historical) for that pattern? I saw it's more common to use ifndef/ifdef pattern and the config file have all possible options listed, but commented if not used.

Yes, the ifndef/ifdef pattern is also very common, but it may be not very convenience to use the config file: if it isn't created and all the configurations are controlled by makefile, it may be not easy to learn all the configs in the makefile. And if the config header file is used, (1) if it is auto generated e.g. autoconf.h, it may be under the output build folder and not easy to find it, (2) if it is one file of the project and all the configs are listed, then as you mentioned, if we want to enable/disable one config, we may need to uncomment/comment the macro manually or by makefile, just not so convenience as #if xxx != 0/#if xxx == 0 pattern with which normally we don't need to modify the config file.

loganek · 2022-11-29T10:42:31Z

@loganek Per my understanding, your mainly idea is to create runtime/vm context multiple times to eliminate the singletons/global objects in wasm_memory.c and some other places, and for each wasm_runtime_init/full_init, we return a runtime context instead, right? By this way we can custom the individual behavior for each runtime context. My concern is that this would greatly change the runtime APIs and cause incompatibility with old versions, for example, for wasm_runtime_load/wasm_runtime_instantiate and some other APIs, we should add runtime context as a new argument. And not sure whether it is necessary? Since we have already supported created multiple wasm module/module_inst in a thread.

Yes, I thought of having a context per runtime. And yes, I agree that will be a lot of changes, however, none of that should at least affect wasm_c_api.h changes, although there need to be changes in wasm_export.h. I think we don't have to modify the API, but instead, we can extend it by adding a new function (e.g. with _ex or _with_ctx suffix) for every function that assumes the singleton.

Or do we just want to support using different memory allocator for a module/module_inst? That seems easier but should also require lots of changes.

No, I didn't think of having per-module allocator, although if there's a need for that, and it doesn't cause any risk, we might consider that in the design as well.

As I said, I agree this is a complex change and we might not eventually want to do that if there's too much risk in having that; I'd suggest let's discuss it first before making any further steps.

nchamzn · 2022-11-29T11:53:01Z

Had to make some more changes because when the build definition is set, the function type is different which leads to some issues when the build definition is set and the user decides to use the system allocator. I resolved these issues by calling the system allocator functions (os_malloc etc) directly when using the system allocator option.

core/iwasm/common/wasm_memory.c

…nce#1765) Add an option to pass user data to the allocator functions. It is common to do this so that the host embedder can pass a struct as user data and access that struct from the allocator, which gives the host embedder the ability to do things such as track allocation statistics within the allocator. Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable the option, and the allocator functions provided by the host embedder should be like below (an extra argument `data` is added): void *malloc(void *data, uint32 size) { .. } void *realloc(void *data, uint32 size) { .. } void free(void *data, void *ptr) { .. } Signed-off-by: Andrew Chambers <[email protected]>

Add an option to pass user data to the allocator functions. It is common to do this so that the host embedder can pass a struct as user data and access that struct from the allocator, which gives the host embedder the ability to do things such as track allocation statistics within the allocator. Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable the option, and the allocator functions provided by the host embedder should be like below (an extra argument `data` is added): void *malloc(void *data, uint32 size) { .. } void *realloc(void *data, uint32 size) { .. } void free(void *data, void *ptr) { .. } Signed-off-by: Andrew Chambers <[email protected]>

…nce#1765) Add an option to pass user data to the allocator functions. It is common to do this so that the host embedder can pass a struct as user data and access that struct from the allocator, which gives the host embedder the ability to do things such as track allocation statistics within the allocator. Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable the option, and the allocator functions provided by the host embedder should be like below (an extra argument `data` is added): void *malloc(void *data, uint32 size) { .. } void *realloc(void *data, uint32 size) { .. } void free(void *data, void *ptr) { .. } Signed-off-by: Andrew Chambers <[email protected]>

loganek reviewed Nov 28, 2022

View reviewed changes

nchamzn force-pushed the alloc-with-data branch from 1054b87 to 75fc972 Compare November 28, 2022 21:53

loganek reviewed Nov 28, 2022

View reviewed changes

nchamzn commented Nov 28, 2022

View reviewed changes

nchamzn force-pushed the alloc-with-data branch 3 times, most recently from 1a7badf to 59a0bbc Compare November 29, 2022 10:18

wenyongh reviewed Nov 29, 2022

View reviewed changes

nchamzn force-pushed the alloc-with-data branch from 59a0bbc to cdefe96 Compare November 29, 2022 11:51

wenyongh reviewed Nov 29, 2022

View reviewed changes

core/iwasm/common/wasm_memory.c Outdated Show resolved Hide resolved

core/iwasm/common/wasm_memory.c Show resolved Hide resolved

nchamzn force-pushed the alloc-with-data branch from cdefe96 to 311181b Compare November 29, 2022 12:38

wenyongh reviewed Nov 30, 2022

View reviewed changes

core/iwasm/common/wasm_memory.c Outdated Show resolved Hide resolved

Adding option to pass user data to allocator functions

e4b9a8c

nchamzn force-pushed the alloc-with-data branch from 311181b to e4b9a8c Compare November 30, 2022 07:18

wenyongh merged commit 3e8927a into bytecodealliance:main Nov 30, 2022

Adding option to pass user data to allocator functions #1765

Adding option to pass user data to allocator functions #1765

Uh oh!

Conversation

nchamzn commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Example usage:

Uh oh!

loganek Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenyongh commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loganek commented Nov 29, 2022

Uh oh!

nchamzn commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenyongh commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nchamzn Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loganek commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nchamzn commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nchamzn commented Nov 28, 2022 •

edited

Loading

loganek Nov 28, 2022 •

edited

Loading

wenyongh commented Nov 29, 2022 •

edited

Loading

nchamzn commented Nov 29, 2022 •

edited

Loading

wenyongh commented Nov 29, 2022 •

edited

Loading

nchamzn Nov 29, 2022 •

edited

Loading

loganek commented Nov 29, 2022 •

edited

Loading

nchamzn commented Nov 29, 2022 •

edited

Loading