Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-API: usercode_error missing from realm_sync_error #7098

Closed
nirinchev opened this issue Oct 31, 2023 · 14 comments · Fixed by #7148
Closed

C-API: usercode_error missing from realm_sync_error #7098

nirinchev opened this issue Oct 31, 2023 · 14 comments · Fixed by #7148
Assignees

Comments

@nirinchev
Copy link
Member

nirinchev commented Oct 31, 2023

When a client reset handler fails because of an error in the callback, the SDK calls realm_register_user_code_callback_error to store it. However, the error that we in the callback provided to realm_sync_config_set_error_handler is realm_sync_error, which doesn't contain a pointer to the original error. So we get an error with a message A fatal error occurred during client reset: 'User-provided callback failed' and code 1028, but we don't have a way to get to the original error that caused the problem.

Note to self: to repro this in dart, throw a user exception in the client reset handler.

@nicola-cab
Copy link
Member

It seems something I forgot to add when I did it... I'll assign this to me and fix it

@jbreams
Copy link
Contributor

jbreams commented Nov 22, 2023

by "Note to self: to repro this in dart, throw a user exception in the client reset handler.", do you mean throw a C++ exception? How are we throwing C++ exceptions across a C API?

@nirinchev
Copy link
Member Author

nirinchev commented Nov 22, 2023

No, we're throwing a regular dart exception. This gets caught by the dart SDK, and stored as a opaque pointer via realm_register_user_code_callback_error. Then we return false from the client reset callback, which causes the C API to throw CallbackFailed():

throw CallbackFailed();

The "note to self section" was literally a reminder when I get back to eventually testing the fix, what needs to be done to reproduce the original problem and verify it is now fixed. It was mostly due to uncertainty when I'd be able to work on dart again or if someone else needed to pick up the verification on the dart side.

@jbreams
Copy link
Contributor

jbreams commented Nov 22, 2023

Got it, that makes much more sense, thank you for explaining.

@nirinchev
Copy link
Member Author

Unfortunately, I don't think the solution in #7148 works reliably. I haven't debugged it, but the observed results are that the value we get in user_code_error is allways nullptr. My uneducated guess is that this is due to the usage of thread local storage and some specifics of dart. Since user code is run - and the error generated - on the dart thread, but the client reset callback is run on the sync client thread, they don't see each other's thread local storage and thus the sync client will always see the user exception as null.

I don't have a great way to fix it other than naively changing the signatures of the client reset callbacks so that rather than return bool, their return void* where nullptr indicates success (i.e. true) and everything else indicates failure where the returned value is the pointer to the user-code error that should be propagated back to the caller.

@jbreams
Copy link
Contributor

jbreams commented Dec 8, 2023

Or we could add a void** out parameter to the callbacks. So when the sync client calls these callbacks, does that somehow schedule a callback on another thread that the sync client then waits for to complete? Can you point me to the code in the dart SDK where this happens?

@nirinchev
Copy link
Member Author

nirinchev commented Dec 8, 2023

Hey, so I think I was able to fix it on the dart side, so we can either change it in a similar way for the C API or if we think this is a dart peculiarity (which it may very well be), then we can re-close this issue.

So for a bit of context, since dart is single threaded, any callbacks from Core into the user app need to be scheduled to run on the dart thread (which happens async). This means we need to do a bit of locking to turn those async callbacks into synchronous ones that fit the signature coming from Core. The code that does that is not the most readable, but here it is:

https://github.com/realm/realm-dart/blob/main/src/realm_dart_sync.cpp#L184-L214

So for user callback for before_open we provide realm_dart_sync_before_reset_handler_callback, which constructs a function that invokes a wrapper around the user-defined callback on the dart scheduler. It then constructs the unlock function and passes it as an opaque pointer to the function invoked on the dart scheduler. Once the dart callback completes, we invoke the unlock function by calling

https://github.com/realm/realm-dart/blob/4fc8e47307973334b73446ec4d37ef3378268cfb/src/realm_dart.cpp#L113-L116

Previously, in our wrapper function, we would invoke the user callback, catch any exceptions and call Core's realm_register_user_code_callback_error to store them:

https://github.com/realm/realm-dart/blob/4fc8e47307973334b73446ec4d37ef3378268cfb/lib/src/native/realm_core.dart#L589C10-L599

Here's a PR with some changes to that flow that make it so my smoke test passes: realm/realm-dart#1447. Essentially, what changes there is that instead of registering the user error from dart and returning a bool for the callback, we return a pointer to the user error and if that's not nullptr, we call realm_register_user_code_callback_error from C++ after the condition.wait returns, which ensures we're back on the sync client thread.

Apologies for the long explanation and the back-and-forth on this, but if you agree this is a reasonable approach, I'd be happy to close the core issue and move forward with the dart-specific workaround in the dart repo.

@nicola-cab
Copy link
Member

nicola-cab commented Dec 8, 2023

So, if I am not mistaking, you are basically using some synchronization between Sync Thread and Main thread. But this approach does not seem right to me.
I wonder, is the same if you try to implement the same feature in dotnet? I know @clementetb tried the branch in Kotilin and worked, but maybe we need a second pass.

At the same time, I am not sure if I fully understand the issue.
So when the sync client callback runs, you are in the sync thread, if some user code fails there, you should be calling register_user_callback (which in my understanding should happen in the same thread in which the client reset is running), this thread writes to his thread local storage (in the sync thread).

Once some error arises in the sync code, then the sync thread calls the callback registered via realm_sync_config_set_error_handler ... but when it runs in which thread is this callback running?
Because if it happens inside the main thread, then the problem might very much be that, since we are accessing its local storage which could be empty.

I don't believe the code for fixing the issue is correct, since we are basically syncing 2 threads, which is the opposite of what we are trying to do, having the sync thread running in parallel.
At the same time, I don't think we should be writing any synchronization in the C-API.

@nirinchev
Copy link
Member Author

The problem is that dart code cannot execute on the sync thread, so even though the callback is invoked on the sync thread, we need to bounce back to the main (only) thread in dart and execute the code there, which is why we're synchronizing this. It wouldn't be a problem in C#/Kotlin where there are no restrictions regarding the thread user code executes on.

I'm fairly certain the synchronization is necessary because the sync thread needs to block until the user callback completes - it shouldn't be executing code in parallel as it wouldn't be safe to manipulate the Realm while the user might be trying to copy data from it. Once the callback completes, it can continue running in parallel with the main thread.

@nicola-cab
Copy link
Member

nicola-cab commented Dec 8, 2023

OK thanks for the explanation, now it makes sense (sorry I did not know about these details). I think this is the problem then, the thead local storage when it is read, is empty for the main thread, since we have written in the sync thread. However, I suspect that if we set the error in the sync thread before to invoke the callback, it will be fine.

However, this requires more code in the sync machinery, we need to catch the custom generated exception coming from the C-API in the sync machinery, read the error code there and send it up along with the sync error. @jbreams

I think the synchronization can be avoided, but if we decide to keep it, I don't think it should live in the C-API.

@nirinchev
Copy link
Member Author

Unless you see some issues with the dart changes in realm/realm-dart#1447, I think we can just keep the C API as is and only handle the threading peculiarities in the dart C code.

@nicola-cab
Copy link
Member

nicola-cab commented Dec 8, 2023

I cannot judge the dart code, but the cpp part of it, is basically a notification using a conditional variable. If you say that does not cause problems, I would say let's go with it. However, I think that setting the error_code in the sync_thread can solve this. Let me try...

@nirinchev
Copy link
Member Author

This is precisely the intention behind the changes I made to the dart C API - rather than registering the error from dart code (which happens on the main thread), we do it in native after the condition variable unblocks the sync thread. So I guess we can re-close this issue since things in Core work as intended and it's only due to dart being single-threaded that we need to be more careful about which thread we register the error from.

@nicola-cab
Copy link
Member

@nirinchev I think you need something like this: #7195 ... if it works, I can then try to work on a solution that is not that invasive for the sync machinery.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants