Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential performance improvement. #1572

Closed
Szymon20000 opened this issue Nov 27, 2024 · 19 comments
Closed

Potential performance improvement. #1572

Szymon20000 opened this issue Nov 27, 2024 · 19 comments
Labels
enhancement New feature or request

Comments

@Szymon20000
Copy link

Problem

I'm profiling a RN app and noticed huge performance problems with calling js function from jsi level. I digged into it and it turned out that most of the time is spend on pthread_getattr_np (on android). Here is a screenshot from android studio sampling profiler Screenshot 2024-11-27 at 14 48 53

Solution

Would it be possible to just cache it? Chat GPT suggests something like this:

std::pair<const void *, size_t> thread_stack_bounds(unsigned gap) {
    static thread_local std::pair<const void *, size_t> cachedBounds;
    static thread_local bool isCached = false;

    if (!isCached) {
        pthread_attr_t attr;
        pthread_attr_init(&attr);

        if (pthread_getattr_np(pthread_self(), &attr) != 0) {
            hermes_fatal("Unable to obtain native stack bounds");
        }

        void *origin;
        size_t size;
        if (pthread_attr_getstack(&attr, &origin, &size) != 0) {
            pthread_attr_destroy(&attr);
            hermes_fatal("Unable to obtain stack attributes");
        }

#ifdef __BIONIC__
        // Adjust for guard pages on Android/Bionic
        size_t guardSize;
        if (pthread_attr_getguardsize(&attr, &guardSize) != 0) {
            guardSize = 0; // Default to 0 on error
        }
        if (guardSize > size) {
            guardSize = size;
        }
        origin = static_cast<char *>(origin) + guardSize;
        size -= guardSize;
#endif

        pthread_attr_destroy(&attr);

        // Cache the result
        cachedBounds = {static_cast<char *>(origin) + size, size};
        isCached = true;
    }

    // Adjust for the requested gap
    const auto &[stackEnd, stackSize] = cachedBounds;
    size_t adjustedSize = gap < stackSize ? stackSize - gap : 0;
    return {stackEnd, adjustedSize};
}

Additional Context

@Szymon20000 Szymon20000 added the enhancement New feature or request label Nov 27, 2024
@tmikov
Copy link
Contributor

tmikov commented Nov 27, 2024

Hi, the stack bounds is already cached in StackOverflowGuard. oscompat::thread_stack_bounds() is called only from StackOverflowGuard::isStackOverflowingSlowPath(), which itself is invoked only if the thread using the instance of vm::Runtime has changed.

Can you verify whether that is actually happening? Are you calling the runtime from a different thread every time?

@Szymon20000
Copy link
Author

Screenshot 2024-11-27 at 16 45 04 I see it's called multiple times and I check main thread only (used by reanimated).

@Szymon20000
Copy link
Author

Screenshot 2024-11-27 at 16 46 29

@tmikov
Copy link
Contributor

tmikov commented Nov 27, 2024

Sorry, I don't understand what these screenshots are supposed to tell me.

@Szymon20000
Copy link
Author

That is flame graph from a single thread. On those screenshots you can see that thread_stack_bounds is called multiple times. If it would be cached in this flow we would see it only once. workouts::SharebleWorklet::toJSValue calls another jsi::Function which calls ScopedNativeCallFrame which finally calls thread_stack_bounds. There are 2 such calls on the first screenshot and 3 on the second one. The last screenshot is just to show that all comes from the same thread :).

@bartlomiejbloniarz
Copy link
Contributor

Hi! @Szymon20000 I was recently working on this problem in reanimated. This is something that happens when there are calls from different threads to the reanimated worklet runtime. I think you should also see some calls to the reanimated runtime on the JS thread happening at the same time in your trace.

The problem is that Hermes goes through a slow path overflow check when it's called from a different thread. So if there are any interleaving JS thread and UI thread calls, then you will see a performance degradation. This issue is present since:

  • this check was added to Hermes (I think around RN 0.73)
  • we added synchronous calls to the worklet runtime from the JS thread here

We are not yet sure how we want to address it in reanimated since we are not sure if this is a problem on our side or if this is a problem on the application side.

Could you share with us the code that's causing this to happen, as this would help us decide our next steps?

@tmikov
Copy link
Contributor

tmikov commented Nov 27, 2024

I am not opposed to adding per-thread caching, but before we do it, I would like to understand the use case.

So, is the case that we are getting calls to the same runtime from the same two (or more) threads, just switching between them?

Also, when this happens, is the JS stack empty?

@Szymon20000
Copy link
Author

Ah so currently Hermes only keeps last thread cached? The calls I showed are only from main but it is possible that between them the runtime is accessed from other threads.

@Szymon20000
Copy link
Author

If that's the case then it looks like a bug in Reanimated (@bartlomiejbloniarz maybe synchronous calls blocks UI thread but access runtime from js-thread?) but at the same time the same thing can happen in fabric when we (sometimes) render synchronously on UI thread but I would need to check if that happens actually on Js thread and we just block the UI or it happens on UI actually.

@Szymon20000
Copy link
Author

https://github.com/facebook/react-native/blob/1d909efa235ffae150de25a201efbbe752d0bc52/packages/react-native/ReactCommon/runtimeexecutor/ReactCommon/RuntimeExecutor.h#L35 It actually will run it on UI thread so I think we have the same case in Fabric. @tmikov But If there are no other use cases for caching it per thread then it would be good to suggest to RN team to call callback on JS thread always and just block current thread.

@Szymon20000
Copy link
Author

@bartlomiejbloniarz I think it's enough to block js thread and schedule a call on UI this will probably fix this problem. Right now it takes lock and uses runtime on JS thread. https://github.com/software-mansion/react-native-reanimated/blob/0162804a8ace2a8f6b77764894d8e4d87f94781a/packages/react-native-reanimated/Common/cpp/worklets/WorkletRuntime/WorkletRuntime.cpp#L84

@Szymon20000
Copy link
Author

Szymon20000 commented Nov 27, 2024

software-mansion/react-native-reanimated#6770 @bartlomiejbloniarz This seems to fix it but at the same time now we need to wait for the main thread even when reanimated is currently not using the thread. So I think caching per thread would be actually better @tmikov .

Another approach would be to make reanimated use separate thread and also call runSync from UI but not sure how big would be the overhead.

@bartlomiejbloniarz
Copy link
Contributor

bartlomiejbloniarz commented Nov 28, 2024

@tmikov yes. We've seen this in one of our client's apps, that they had many calls to our hermes runtime from both UI and the JS thread simultaneously. They weren't able to give me a reproduction, but I was able to reproduce it "artificially," simply by:

  • running a reanimated animation (on the UI thread)
  • reading from a shared value from a setInterval (this triggers synchronous calls to our runtime from the JS thread)

I think a more natural scenario for this to happen would be if someone was running an animation on a component with an onLayout callback, that was doing reads from a reanimated shared value.

I think that usually it's an anti-pattern to do this in reanimated (but there might be some cases when it is necessary, I just haven't seen them yet). Adding per-thread caching would be I think the best solution for us.

@Szymon20000 I don't think that this change in reanimated is a good solution. I will give a more in depth answer in the PR.

@Szymon20000
Copy link
Author

Screenshot 2024-11-28 at 12 26 32

@tjzel
Copy link

tjzel commented Nov 28, 2024

@tmikov
To simplify the use case a bit better:

In Reanimated we don't exactly couple additional runtimes to a given thread. They're considered resources which any thread could potentially obtain for a period of time.

Sometimes, a runtime (let's call it Primary) might require synchronization of some data with another runtime (Secondary) on a user's request. Whether this request is reasonable from the user's App design perspective is a whole another topic. Because the runtimes aren't coupled to a thread, it might occur that Secondary isn't used by any thread at the moment and we can acquire it "freely" from Primary's thread. In result we get a much faster sync. Secondary is released by the Primary's thread after the sync and we are back where we were.

At the time we preferred that solution to it's alternative - runtime-thread coupling.

In the coupling scenario, we'd need to schedule an operation from the Primary's thread to the Secondary's thread. This seems good on paper - except for the fact that in the majority of our cases the Secondary's thread is the Main thread. Schedule the operation at an unlucky moment and you might need to wait for a very long time to get its result.

@bartlomiejbloniarz Please correct me if I omitted/oversimplified some things here.

@gaearon
Copy link

gaearon commented Dec 5, 2024

@tmikov Do you have enough information from the responses above or is there still uncertainty about the use cases? Just trying to figure out where it is & if there's something actionable to do to move this forward. Thanks!

@tmikov
Copy link
Contributor

tmikov commented Dec 5, 2024

We are adding this optimization.

@tmikov
Copy link
Contributor

tmikov commented Dec 6, 2024

The fix landed in Static Hermes here: 4fc57f3

We are backporting it to Hermes as well.

@tmikov
Copy link
Contributor

tmikov commented Dec 9, 2024

Backport landed in Hermes 568b1c9.

@tmikov tmikov closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants