Skip to content

Conversation

@nchamzn
Copy link
Contributor

@nchamzn nchamzn commented Nov 28, 2022

Adds an option to pass user data to the allocator functions. It's common to do this so that if I am embedding Wamr in to my application then I can pass a struct as user data and access that struct from the allocator which gives me the ability to do things such as track allocation statistics within my allocator.

Notes

  • I added this as a new option so I didn't break the existing API for users.
  • I moved the definitions of MemAllocOption and mem_alloc_type_t to a new header, I don't know if there was a reason they were duplicated.

Example usage:

Engine::Engine() : _total_allocations(0)
{
    RuntimeInitArgs args;
    memset(&args, 0, sizeof(RuntimeInitArgs));

    args.mem_alloc_type = Alloc_With_Allocator_With_Data;
    args.mem_alloc_option.allocator_with_data.data = reinterpret_cast<void*>(this);
    args.mem_alloc_option.allocator_with_data.malloc_func = reinterpret_cast<void*>(engine_malloc);
    args.mem_alloc_option.allocator_with_data.realloc_func = reinterpret_cast<void*>(engine_realloc);
    args.mem_alloc_option.allocator_with_data.free_func = reinterpret_cast<void*>(engine_free);

    // ...
    auto engine = wasm_engine_new_with_args(args.mem_alloc_type, &args.mem_alloc_option);

    // ...
}

void* Engine::malloc(void* data, size_t size)
{
    auto that = reinterpret_cast<Engine*>(data);
    that->_total_allocations++;
    // Now allocate some memory
}

void Engine::free(void* data, void* ptr)
{
    auto that = reinterpret_cast<Engine*>(data);
    that->_total_allocations--;
    // Now free some memory
}

Alloc_With_Allocator,
/* user allocator mode, allocate memory from user defined
malloc function with user data */
Alloc_With_Allocator_With_Data,
Copy link
Collaborator

@loganek loganek Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be moved to the bottom as a last option? Otherwise this change breaks ABI backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updated

void *realloc_func;
void *free_func;
void *data;
} allocator_with_data;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if you also explored alternative approach where we have a single struct (allocator) with void* data inside ifdef? It might make the code less readable, but that possibly would simplify the implementation? That's just an idea though, not sure if you already thought about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say inside ifdef you mean add a build option?

I did consider just putting data in the allocator struct and then only using in the Alloc_With_Allocator_With_Data case but thought it might be clearer and more typesafe to do it like this - however I don't mind changing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant the build option, something like:

typedef union MemAllocOption {
    struct {
        void *heap_buf;
        uint32_t heap_size;
    } pool;
    struct {
        void *malloc_func;
        void *realloc_func;
        void *free_func;
#ifdef MEM_ALLOC_WITH_USER_DATA
        void *user_data;
#endif
    } allocator;
} MemAllocOption;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at this - Wamr is not using configure_file anywhere for definitions so adding an #ifdef in a header like this is a bit cumbersome - I can do it, but the host will need to also add the definition when including the header

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, didn't realize that.

static void (*free_func)(void *ptr) = NULL;

static void *allocator_data = NULL;
static void *(*malloc_func_with_data)(void *data, unsigned int size) = NULL;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also considered here reusing the existing variables but erasing the type and casting when the functions are called, but went this way - also happy to change that

@wenyongh
Copy link
Contributor

wenyongh commented Nov 29, 2022

@nchamzn, seems that in you host application, you pass the malloc_func/free_func callback (mode Alloc_With_Allocator) to runtime init, and want to track the allocation statistics, right? Not sure why not wrap the malloc_func/free_func, do the statistics in the wrapper APIs, and pass the wrapper APIs to runtime? For example:

static data;
void *my_malloc(uint32 size) {
   data->alloc_count++;
   return engine_malloc(size);
}

void my_free(void *ptr) {
    data->free_count++;
    engine_free(ptr);
}

    RuntimeInitArgs args;
    memset(&args, 0, sizeof(RuntimeInitArgs));

    args.mem_alloc_type = Alloc_With_Allocator;
    args.mem_alloc_option.allocator.malloc_func = reinterpret_cast<void*>(my_malloc);
    args.mem_alloc_option.allocator.realloc_func = reinterpret_cast<void*>(my_realloc);
    args.mem_alloc_option.allocator.free_func = reinterpret_cast<void*>(my_free);

    // ...
    auto engine = wasm_engine_new_with_args(args.mem_alloc_type, &args.mem_alloc_option);

Had better not extend the runtime option, if it is unnecessary.

@loganek
Copy link
Collaborator

loganek commented Nov 29, 2022

@wenyongh I agree better to not modify the API, and actually in its current design, where the engine itself and allocation functions are singletons, there's no much benefit of having user data, because it can be a global variable instead (like you suggested in your comment).

Having said that, I wonder if we should start working towards having a concept of contexts instead of singletons/global objects so it's possible to embed multiple VMs in a single process? For example:

  • CPython follows similar approach and only allows having a single interpreter instance per process (although there's a lot of differences between that and WAMR, and CPython)
  • LUA has lua_State which holds all the information about the execution environment (including allocation methods) so we can have multiple scripts running independently in a single process

I presume moving away from global objects would require much larger refactoring and probably needs more through analysis; also, I'm aware I don't know have all the data behind decisions that were made in WAMR a long time ago, and there might be good reasons to keep it as it is. I've added this topic to the next TSC meeting agenda to discuss this topic with the others and gather their feedback.

@nchamzn
Copy link
Contributor Author

nchamzn commented Nov 29, 2022

The suggestion is how we have implemented this at the moment but I see it as unnecessarily complex and means we have to compromise on design/encapsulation of the embedding application - not passing user data forces the embedding application to use a singleton/static even though it shouldn't be required.

I already have a singleton Engine class and the allocator should have the same lifetime, as it does in Wamr (See: wasm_runtime_memory_init/wasm_runtime_memory_destroy), but to ensure my own allocator has the same lifetime I need to add functions to mutate this singleton from my Engine constuctor/destructor whereas in an ideal world it would just be a member of the Engine class:

class Engine
{
  Engine& get_instance();

  AllocStats get_alloc_stats();
private:
  Allocator _allocator;
};

For what it's worth, this pattern is used by all other Vms I've embedded before:

@nchamzn nchamzn force-pushed the alloc-with-data branch 3 times, most recently from 1a7badf to 59a0bbc Compare November 29, 2022 10:18
@wenyongh
Copy link
Contributor

wenyongh commented Nov 29, 2022

@wenyongh I agree better to not modify the API, and actually in its current design, where the engine itself and allocation functions are singletons, there's no much benefit of having user data, because it can be a global variable instead (like you suggested in your comment).

Having said that, I wonder if we should start working towards having a concept of contexts instead of singletons/global objects so it's possible to embed multiple VMs in a single process? For example:

  • CPython follows similar approach and only allows having a single interpreter instance per process (although there's a lot of differences between that and WAMR, and CPython)
  • LUA has lua_State which holds all the information about the execution environment (including allocation methods) so we can have multiple scripts running independently in a single process

I presume moving away from global objects would require much larger refactoring and probably needs more through analysis; also, I'm aware I don't know have all the data behind decisions that were made in WAMR a long time ago, and there might be good reasons to keep it as it is. I've added this topic to the next TSC meeting agenda to discuss this topic with the others and gather their feedback.

@loganek Per my understanding, your mainly idea is to create runtime/vm context multiple times to eliminate the singletons/global objects in wasm_memory.c and some other places, and for each wasm_runtime_init/full_init, we return a runtime context instead, right? By this way we can custom the individual behavior for each runtime context. My concern is that this would greatly change the runtime APIs and cause incompatibility with old versions, for example, for wasm_runtime_load/wasm_runtime_instantiate and some other APIs, we should add runtime context as a new argument. And not sure whether it is necessary? Since we have already supported created multiple wasm module/module_inst in a thread.

Or do we just want to support using different memory allocator for a module/module_inst? That seems easier but should also require lots of changes.

Comment on lines 28 to 30
#ifdef MEM_ALLOC_WITH_USER_DATA
void *user_data;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not add the macro control in the exported header files, since the host embedder may use the header file directly, link with the built vm library file and don't add macro to build the host application. Normally we use cmake variables or C macros to build the library, but we don't add extra macro control in host application for the wasm runtime exported header file.

Prefer not to add the macro control, just add the user_data field and add comments, and set it for wasm_runtime_init if the runtime is built with user data enabled. And ignore it if runtime is built without user data enabled.

Refer to: https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/include/wasm_export.h#L140-L148
For these fields, we don't add macro to control them either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mentioned this here. If you are happy with this being always defined then that is much easier. Thanks for the link, that helps

Copy link
Contributor Author

@nchamzn nchamzn Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using CMake configure_file by the way? Then the header could include config.h and this would be configured by cmake to have the correct definitions defined depending on the build configuration

opts->allocator.free_func;
init_args.mem_alloc_option.allocator.realloc_func =
opts->allocator.realloc_func;
#ifdef MEM_ALLOC_WITH_USER_DATA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we use #if MEM_ALLOC_WITH_USER_DATA != 0 and #if MEM_ALLOC_WITH_USER_DATA == 0 instead of #ifdef MEM_ALLOC_WITH_USER_DATA and #ifndef MEM_ALLOC_WITH_USER_DATA.

And add a macro definition in core/config.h:

#ifndef MEM_ALLOC_WITH_USER_DATA
#define MEM_ALLOC_WITH_USER_DATA 0
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, is there any reason (other than historical) for that pattern? I saw it's more common to use ifndef/ifdef pattern and the config file have all possible options listed, but commented if not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the ifndef/ifdef pattern is also very common, but it may be not very convenience to use the config file: if it isn't created and all the configurations are controlled by makefile, it may be not easy to learn all the configs in the makefile. And if the config header file is used, (1) if it is auto generated e.g. autoconf.h, it may be under the output build folder and not easy to find it, (2) if it is one file of the project and all the configs are listed, then as you mentioned, if we want to enable/disable one config, we may need to uncomment/comment the macro manually or by makefile, just not so convenience as #if xxx != 0/#if xxx == 0 pattern with which normally we don't need to modify the config file.

@loganek
Copy link
Collaborator

loganek commented Nov 29, 2022

@loganek Per my understanding, your mainly idea is to create runtime/vm context multiple times to eliminate the singletons/global objects in wasm_memory.c and some other places, and for each wasm_runtime_init/full_init, we return a runtime context instead, right? By this way we can custom the individual behavior for each runtime context. My concern is that this would greatly change the runtime APIs and cause incompatibility with old versions, for example, for wasm_runtime_load/wasm_runtime_instantiate and some other APIs, we should add runtime context as a new argument. And not sure whether it is necessary? Since we have already supported created multiple wasm module/module_inst in a thread.

Yes, I thought of having a context per runtime. And yes, I agree that will be a lot of changes, however, none of that should at least affect wasm_c_api.h changes, although there need to be changes in wasm_export.h. I think we don't have to modify the API, but instead, we can extend it by adding a new function (e.g. with _ex or _with_ctx suffix) for every function that assumes the singleton.

Or do we just want to support using different memory allocator for a module/module_inst? That seems easier but should also require lots of changes.

No, I didn't think of having per-module allocator, although if there's a need for that, and it doesn't cause any risk, we might consider that in the design as well.

As I said, I agree this is a complex change and we might not eventually want to do that if there's too much risk in having that; I'd suggest let's discuss it first before making any further steps.

@nchamzn
Copy link
Contributor Author

nchamzn commented Nov 29, 2022

Had to make some more changes because when the build definition is set, the function type is different which leads to some issues when the build definition is set and the user decides to use the system allocator. I resolved these issues by calling the system allocator functions (os_malloc etc) directly when using the system allocator option.

@wenyongh wenyongh merged commit 3e8927a into bytecodealliance:main Nov 30, 2022
NingW101 pushed a commit to NingW101/wasm-micro-runtime that referenced this pull request Dec 1, 2022
…nce#1765)

Add an option to pass user data to the allocator functions. It is common to
do this so that the host embedder can pass a struct as user data and access
that struct from the allocator, which gives the host embedder the ability to
do things such as track allocation statistics within the allocator.

Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable
the option, and the allocator functions provided by the host embedder should
be like below (an extra argument `data` is added):
void *malloc(void *data, uint32 size) { .. }
void *realloc(void *data, uint32 size) { .. }
void free(void *data, void *ptr) { .. }

Signed-off-by: Andrew Chambers <[email protected]>
wenyongh pushed a commit that referenced this pull request Dec 6, 2022
Add an option to pass user data to the allocator functions. It is common to
do this so that the host embedder can pass a struct as user data and access
that struct from the allocator, which gives the host embedder the ability to
do things such as track allocation statistics within the allocator.

Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable
the option, and the allocator functions provided by the host embedder should
be like below (an extra argument `data` is added):
void *malloc(void *data, uint32 size) { .. }
void *realloc(void *data, uint32 size) { .. }
void free(void *data, void *ptr) { .. }

Signed-off-by: Andrew Chambers <[email protected]>
wenyongh pushed a commit that referenced this pull request Dec 19, 2022
Add an option to pass user data to the allocator functions. It is common to
do this so that the host embedder can pass a struct as user data and access
that struct from the allocator, which gives the host embedder the ability to
do things such as track allocation statistics within the allocator.

Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable
the option, and the allocator functions provided by the host embedder should
be like below (an extra argument `data` is added):
void *malloc(void *data, uint32 size) { .. }
void *realloc(void *data, uint32 size) { .. }
void free(void *data, void *ptr) { .. }

Signed-off-by: Andrew Chambers <[email protected]>
vickiegpt pushed a commit to vickiegpt/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024
…nce#1765)

Add an option to pass user data to the allocator functions. It is common to
do this so that the host embedder can pass a struct as user data and access
that struct from the allocator, which gives the host embedder the ability to
do things such as track allocation statistics within the allocator.

Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable
the option, and the allocator functions provided by the host embedder should
be like below (an extra argument `data` is added):
void *malloc(void *data, uint32 size) { .. }
void *realloc(void *data, uint32 size) { .. }
void free(void *data, void *ptr) { .. }

Signed-off-by: Andrew Chambers <[email protected]>
g0djan pushed a commit to g0djan/wasm-micro-runtime that referenced this pull request Sep 30, 2024
…nce#1765)

Add an option to pass user data to the allocator functions. It is common to
do this so that the host embedder can pass a struct as user data and access
that struct from the allocator, which gives the host embedder the ability to
do things such as track allocation statistics within the allocator.

Compile with `cmake -DWASM_MEM_ALLOC_WITH_USER_DATA=1` to enable
the option, and the allocator functions provided by the host embedder should
be like below (an extra argument `data` is added):
void *malloc(void *data, uint32 size) { .. }
void *realloc(void *data, uint32 size) { .. }
void free(void *data, void *ptr) { .. }

Signed-off-by: Andrew Chambers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants