Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in C API in some uses of wasm_name_t #2210

Merged
merged 5 commits into from
Mar 26, 2021

Conversation

MarkMcCaskey
Copy link
Contributor

@MarkMcCaskey MarkMcCaskey commented Mar 23, 2021

This PR fixes a bug relating to wasm_importtype_t leaking memory, particularly on the path where String is converted into wasm_name_t. It fixes this by introducing a new type, owned_wasm_name_t, which is identical to wasm_name_t but has RAII and calls the destructor if it goes out of scope.

I'm not confident that the use of owned_wasm_name_t on the FFI boundary is correct though, we'll have to thoroughly review that before shipping this PR.

Review

  • Add a short description of the the change to the CHANGELOG.md file

Converting from `String` means that the type should auto-destruct to make sure
we free that memory. The memory can still be freed manually though. We need
thorough review for the changes were the `owned_wasm_name_t` is coming from user
code. Is it guaranteed that `wasm_name_t` coming from the user always uses a
host allocation? I don't think so, in which case, we have to find some way to
handle transfer of ownership where it's unclear who owns it...
@MarkMcCaskey MarkMcCaskey requested a review from Hywan March 23, 2021 20:32
@MarkMcCaskey MarkMcCaskey requested a review from jubianchi as a code owner March 23, 2021 20:32
Copy link
Contributor

@Hywan Hywan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the Box<owned_wasm_name_t>. Previously we had Box<wasm_name_t> which means that we own wasm_name_t. If wasm_name_t isn't drop with the Box, it's a bug. I think it can be solved by implementing Drop on wasm_name_t, rather than introducing a new non-standard type. Thoughts?

@Hywan Hywan self-assigned this Mar 24, 2021
@Hywan Hywan added bug Something isn't working 📦 lib-c-api About wasmer-c-api labels Mar 24, 2021
@MarkMcCaskey
Copy link
Contributor Author

I'm confused by the Box<owned_wasm_name_t>. Previously we had Box<wasm_name_t> which means that we own wasm_name_t. If wasm_name_t isn't drop with the Box, it's a bug. I think it can be solved by implementing Drop on wasm_name_t, rather than introducing a new non-standard type. Thoughts?

Yeah, I should have written more for you to review this with, wasn't expecting a review so soon though! I'm a bit confused about what exactly ownership means right now but I felt like I fully understood it before, I'll look into more and see if I can figure it out again.

One distinction is that Box<wasm_name_t> could be a valid way to say "I own the metadata about a name but I don't own the name's data". owned_wasm_name_t makes it explicit that we're referring exclusively to the name data and not the metadata. I agree that perhaps this distinction may not make sense, we need to reevalutate what own means. This system is definitely made more complex by the fact that people don't just pass structs by value in C, even when they're small. wasm_byte_vec_t should honestly just be passed by value rather than needing to allocate and pass by pointer...

I agree that we probably need to reevaluate what we're doing though, so Clone of wasm_*_vec_t is currently a deep copy. But we don't implement Drop on it.

The most important idea in the PR right now is the idea that our conversion function from String into a wasm_name_t type should only create a type that is explicitly responsible for freeing its own host-allocated memory.

@MarkMcCaskey
Copy link
Contributor Author

MarkMcCaskey commented Mar 24, 2021

Alright, new hypothesis, Box<wasm_name_t> is always wrong on the FFI boundary.

Let's walk through how wasm_name_t is used in wasm.h:

static inline void wasm_name_new_from_string(
  own wasm_name_t* out, own const char* s
) {
  wasm_name_new(out, strlen(s), s);
}

is a common way to make one (note how the pointer is being passed in, the metadata about ptr, len exists on the C-side when we're dealing with owned wasm_name_ts).

wasm_name_new is just wasm_byte_vec_new:

#define wasm_name_new wasm_byte_vec_new

Which is created in this macro:

  WASM_API_EXTERN void wasm_##name##_vec_new( \
    own wasm_##name##_vec_t* out, \
    size_t, own wasm_##name##_t ptr_or_none const[]); \

And all other uses of wasm_name_t:

WASM_API_EXTERN own wasm_importtype_t* wasm_importtype_new(
  own wasm_name_t* module, own wasm_name_t* name, own wasm_externtype_t*);
WASM_API_EXTERN const wasm_name_t* wasm_importtype_module(const wasm_importtype_t*);
WASM_API_EXTERN const wasm_name_t* wasm_importtype_name(const wasm_importtype_t*);
WASM_API_EXTERN own wasm_exporttype_t* wasm_exporttype_new(
  own wasm_name_t*, own wasm_externtype_t*);
WASM_API_EXTERN const wasm_name_t* wasm_exporttype_name(const wasm_exporttype_t*);

We can see that in all cases with vecs (I also looked at other vecs besides wasm_name_t and as far as I can tell they follow the same pattern), they either exist in the caller's memory (on the C side) or they're not own and are being returned from the host. So all own wasm_xxx_vec_t are not Box.

We never return owned vecs meaning that there's no ambiguity here. own only applies to the data being pointed to by the vec itself, not the metadata (ptr, len) of the vec.


Also I never noticed this before, but there are comments in the official header file that also express surprise about how vectors work 😄 https://github.com/WebAssembly/wasm-c-api/blob/master/include/wasm.h#L46-L65

The comment seems incorrect though, as we can see with the code from above.

// - `own wasm_xxx_vec_t` owns the vector as well as its elements(!)

Is not correct unless

// - `own wasm_xxx_t*` owns the pointed-to data

takes precedence. But if that's the case, then the first rule never applies. A quick search with wasm_[^_]+_vec_t[^*;`] through wasm.h shows that wasm_xxx_vec_t is never passed by value, only by pointer.

Copy link
Contributor

@Hywan Hywan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despites I'm not super happy with the name owned_wasm_name_t (I would prefer something like CBox<wasm_name_t> or Owned<wasm_name_t>), I agree that this is required and that the fix is correct.

@MarkMcCaskey
Copy link
Contributor Author

Despites I'm not super happy with the name owned_wasm_name_t (I would prefer something like CBox<wasm_name_t> or Owned<wasm_name_t>), I agree that this is required and that the fix is correct.

I think something like CBox<> is a bit misleading because we want to indicate that the data and size of the vec should be copied (even though the value is passed by reference...) and that the copied data + size should be considered an owned type that we need to free. I'm actually not sure if there's a decent way to represent what's happening here in the Rust type system all that well. I'm sure we could with a custom repr(transparent) abstraction that uses unsafe internally though.

I don't think the current solution is optimal either but it's easy enough to change in the future if we find something we like better.

@MarkMcCaskey
Copy link
Contributor Author

bors r+

@bors bors bot merged commit 2948bc7 into master Mar 26, 2021
@bors bors bot deleted the fix/c-api-wasm-name-t-mem-leak branch March 26, 2021 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 📦 lib-c-api About wasmer-c-api
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants