Potential performance improvement. #1572

Szymon20000 · 2024-11-27T13:51:53Z

Problem

I'm profiling a RN app and noticed huge performance problems with calling js function from jsi level. I digged into it and it turned out that most of the time is spend on pthread_getattr_np (on android). Here is a screenshot from android studio sampling profiler

Solution

Would it be possible to just cache it? Chat GPT suggests something like this:

std::pair<const void *, size_t> thread_stack_bounds(unsigned gap) {
    static thread_local std::pair<const void *, size_t> cachedBounds;
    static thread_local bool isCached = false;

    if (!isCached) {
        pthread_attr_t attr;
        pthread_attr_init(&attr);

        if (pthread_getattr_np(pthread_self(), &attr) != 0) {
            hermes_fatal("Unable to obtain native stack bounds");
        }

        void *origin;
        size_t size;
        if (pthread_attr_getstack(&attr, &origin, &size) != 0) {
            pthread_attr_destroy(&attr);
            hermes_fatal("Unable to obtain stack attributes");
        }

#ifdef __BIONIC__
        // Adjust for guard pages on Android/Bionic
        size_t guardSize;
        if (pthread_attr_getguardsize(&attr, &guardSize) != 0) {
            guardSize = 0; // Default to 0 on error
        }
        if (guardSize > size) {
            guardSize = size;
        }
        origin = static_cast<char *>(origin) + guardSize;
        size -= guardSize;
#endif

        pthread_attr_destroy(&attr);

        // Cache the result
        cachedBounds = {static_cast<char *>(origin) + size, size};
        isCached = true;
    }

    // Adjust for the requested gap
    const auto &[stackEnd, stackSize] = cachedBounds;
    size_t adjustedSize = gap < stackSize ? stackSize - gap : 0;
    return {stackEnd, adjustedSize};
}

Additional Context

The text was updated successfully, but these errors were encountered:

tmikov · 2024-11-27T15:38:52Z

Hi, the stack bounds is already cached in StackOverflowGuard. oscompat::thread_stack_bounds() is called only from StackOverflowGuard::isStackOverflowingSlowPath(), which itself is invoked only if the thread using the instance of vm::Runtime has changed.

Can you verify whether that is actually happening? Are you calling the runtime from a different thread every time?

Szymon20000 · 2024-11-27T15:45:30Z

I see it's called multiple times and I check main thread only (used by reanimated).

Szymon20000 · 2024-11-27T15:47:08Z

tmikov · 2024-11-27T16:02:47Z

Sorry, I don't understand what these screenshots are supposed to tell me.

Szymon20000 · 2024-11-27T16:08:04Z

That is flame graph from a single thread. On those screenshots you can see that thread_stack_bounds is called multiple times. If it would be cached in this flow we would see it only once. workouts::SharebleWorklet::toJSValue calls another jsi::Function which calls ScopedNativeCallFrame which finally calls thread_stack_bounds. There are 2 such calls on the first screenshot and 3 on the second one. The last screenshot is just to show that all comes from the same thread :).

bartlomiejbloniarz · 2024-11-27T16:53:53Z

Hi! @Szymon20000 I was recently working on this problem in reanimated. This is something that happens when there are calls from different threads to the reanimated worklet runtime. I think you should also see some calls to the reanimated runtime on the JS thread happening at the same time in your trace.

The problem is that Hermes goes through a slow path overflow check when it's called from a different thread. So if there are any interleaving JS thread and UI thread calls, then you will see a performance degradation. This issue is present since:

this check was added to Hermes (I think around RN 0.73)
we added synchronous calls to the worklet runtime from the JS thread here

We are not yet sure how we want to address it in reanimated since we are not sure if this is a problem on our side or if this is a problem on the application side.

Could you share with us the code that's causing this to happen, as this would help us decide our next steps?

tmikov · 2024-11-27T17:11:14Z

I am not opposed to adding per-thread caching, but before we do it, I would like to understand the use case.

So, is the case that we are getting calls to the same runtime from the same two (or more) threads, just switching between them?

Also, when this happens, is the JS stack empty?

Szymon20000 · 2024-11-27T17:21:14Z

Ah so currently Hermes only keeps last thread cached? The calls I showed are only from main but it is possible that between them the runtime is accessed from other threads.

Szymon20000 · 2024-11-27T17:26:16Z

If that's the case then it looks like a bug in Reanimated (@bartlomiejbloniarz maybe synchronous calls blocks UI thread but access runtime from js-thread?) but at the same time the same thing can happen in fabric when we (sometimes) render synchronously on UI thread but I would need to check if that happens actually on Js thread and we just block the UI or it happens on UI actually.

Szymon20000 · 2024-11-27T17:37:16Z

https://github.com/facebook/react-native/blob/1d909efa235ffae150de25a201efbbe752d0bc52/packages/react-native/ReactCommon/runtimeexecutor/ReactCommon/RuntimeExecutor.h#L35 It actually will run it on UI thread so I think we have the same case in Fabric. @tmikov But If there are no other use cases for caching it per thread then it would be good to suggest to RN team to call callback on JS thread always and just block current thread.

Szymon20000 · 2024-11-27T17:41:46Z

@bartlomiejbloniarz I think it's enough to block js thread and schedule a call on UI this will probably fix this problem. Right now it takes lock and uses runtime on JS thread. https://github.com/software-mansion/react-native-reanimated/blob/0162804a8ace2a8f6b77764894d8e4d87f94781a/packages/react-native-reanimated/Common/cpp/worklets/WorkletRuntime/WorkletRuntime.cpp#L84

Szymon20000 · 2024-11-27T18:32:39Z

software-mansion/react-native-reanimated#6770 @bartlomiejbloniarz This seems to fix it but at the same time now we need to wait for the main thread even when reanimated is currently not using the thread. So I think caching per thread would be actually better @tmikov .

Another approach would be to make reanimated use separate thread and also call runSync from UI but not sure how big would be the overhead.

bartlomiejbloniarz · 2024-11-28T10:30:41Z

@tmikov yes. We've seen this in one of our client's apps, that they had many calls to our hermes runtime from both UI and the JS thread simultaneously. They weren't able to give me a reproduction, but I was able to reproduce it "artificially," simply by:

running a reanimated animation (on the UI thread)
reading from a shared value from a setInterval (this triggers synchronous calls to our runtime from the JS thread)

I think a more natural scenario for this to happen would be if someone was running an animation on a component with an onLayout callback, that was doing reads from a reanimated shared value.

I think that usually it's an anti-pattern to do this in reanimated (but there might be some cases when it is necessary, I just haven't seen them yet). Adding per-thread caching would be I think the best solution for us.

@Szymon20000 I don't think that this change in reanimated is a good solution. I will give a more in depth answer in the PR.

Szymon20000 · 2024-11-28T11:26:41Z

tjzel · 2024-11-28T13:16:35Z

@tmikov
To simplify the use case a bit better:

In Reanimated we don't exactly couple additional runtimes to a given thread. They're considered resources which any thread could potentially obtain for a period of time.

Sometimes, a runtime (let's call it Primary) might require synchronization of some data with another runtime (Secondary) on a user's request. Whether this request is reasonable from the user's App design perspective is a whole another topic. Because the runtimes aren't coupled to a thread, it might occur that Secondary isn't used by any thread at the moment and we can acquire it "freely" from Primary's thread. In result we get a much faster sync. Secondary is released by the Primary's thread after the sync and we are back where we were.

At the time we preferred that solution to it's alternative - runtime-thread coupling.

In the coupling scenario, we'd need to schedule an operation from the Primary's thread to the Secondary's thread. This seems good on paper - except for the fact that in the majority of our cases the Secondary's thread is the Main thread. Schedule the operation at an unlucky moment and you might need to wait for a very long time to get its result.

@bartlomiejbloniarz Please correct me if I omitted/oversimplified some things here.

gaearon · 2024-12-05T16:41:17Z

@tmikov Do you have enough information from the responses above or is there still uncertainty about the use cases? Just trying to figure out where it is & if there's something actionable to do to move this forward. Thanks!

tmikov · 2024-12-05T20:10:07Z

We are adding this optimization.

tmikov · 2024-12-06T21:43:27Z

The fix landed in Static Hermes here: 4fc57f3

We are backporting it to Hermes as well.

tmikov · 2024-12-09T21:10:49Z

Backport landed in Hermes 568b1c9.

Szymon20000 added the enhancement New feature or request label Nov 27, 2024

Szymon20000 mentioned this issue Nov 27, 2024

Access Reanimated runtime only from UI thread software-mansion/react-native-reanimated#6770

Open

tmikov closed this as completed Dec 9, 2024

This was referenced Dec 16, 2024

ANR caused by performOperations software-mansion/react-native-reanimated#5884

Closed

ANR on Android software-mansion/react-native-reanimated#6595

Open

bartlomiejbloniarz mentioned this issue Mar 12, 2025

Performance Issues with React Native Bridgeless Architecture facebook/react-native#47490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential performance improvement. #1572

Potential performance improvement. #1572

Szymon20000 commented Nov 27, 2024

tmikov commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

tmikov commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

bartlomiejbloniarz commented Nov 27, 2024

tmikov commented Nov 27, 2024 •

edited

Loading

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024 •

edited

Loading

bartlomiejbloniarz commented Nov 28, 2024 •

edited

Loading

Szymon20000 commented Nov 28, 2024

tjzel commented Nov 28, 2024

gaearon commented Dec 5, 2024

tmikov commented Dec 5, 2024

tmikov commented Dec 6, 2024

tmikov commented Dec 9, 2024

Potential performance improvement. #1572

Potential performance improvement. #1572

Comments

Szymon20000 commented Nov 27, 2024

Problem

Solution

Additional Context

tmikov commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

tmikov commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

bartlomiejbloniarz commented Nov 27, 2024

tmikov commented Nov 27, 2024 • edited Loading

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024

Szymon20000 commented Nov 27, 2024 • edited Loading

bartlomiejbloniarz commented Nov 28, 2024 • edited Loading

Szymon20000 commented Nov 28, 2024

tjzel commented Nov 28, 2024

gaearon commented Dec 5, 2024

tmikov commented Dec 5, 2024

tmikov commented Dec 6, 2024

tmikov commented Dec 9, 2024

tmikov commented Nov 27, 2024 •

edited

Loading

Szymon20000 commented Nov 27, 2024 •

edited

Loading

bartlomiejbloniarz commented Nov 28, 2024 •

edited

Loading