Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Deadlock while reporting an error to Bugsnag via notify #448

Closed
1 of 2 tasks
derrh opened this issue Mar 11, 2020 · 12 comments
Closed
1 of 2 tasks

Deadlock while reporting an error to Bugsnag via notify #448

derrh opened this issue Mar 11, 2020 · 12 comments
Labels
bug Confirmed bug released This feature/bug fix has been released

Comments

@derrh
Copy link

derrh commented Mar 11, 2020

Description

Notify Bugsnag of an error without a deadlock.

Issue

It's taken quite a bit of sleuthing to come to the conclusion that Bugsnag was causing our app to freeze. It appears to be a race condition with very sensitive timing. All of our crash logs resulting from the deadlock (watchdog on iOS finally killing the app due to the deadlock) include the following stack trace on the React Native Bugsnag thread:

Thread 2 Queue : com.facebook.react.BugsnagReactNativeQueue (serial)
#0  0x000000019a8878f0 in __ulock_wait ()
#1  0x000000019a7a1040 in _os_unfair_lock_lock_slow ()
#2  0x000000019a89c628 in dyld3::AllImages::imageLoadAddressByIndex(unsigned int) const ()
#3  0x0000000104acf2e0 in bsg_kscrw_i_writeBinaryImage at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1091
#4  0x0000000104acf4dc in bsg_kscrw_i_writeBinaryImages at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1168
#5  0x0000000104ad0180 in bsg_kscrashreport_writeStandardReport at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashReport.c:1587
#6  0x0000000104ad252c in bsg_kscrashsentry_reportUserException at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/Sentry/BSG_KSCrashSentry_User.c:140
#7  0x0000000104acc290 in bsg_kscrash_reportUserException at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrashC.c:194
#8  0x0000000104acb7a8 in -[BSG_KSCrash reportUserException:reason:originalException:handledState:appState:callbackOverrides:metadata:config:discardDepth:terminateProgram:] at .../cocoa/vendor/bugsnag-cocoa/Source/KSCrash/Source/KSCrash/Recording/BSG_KSCrash.m:306
#9  0x0000000104ae617c in -[BugsnagCrashSentry reportUserException:reason:originalException:handledState:appState:callbackOverrides:metadata:config:discardDepth:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagCrashSentry.m:53
#10 0x0000000104aebfe0 in -[BugsnagNotifier notify:handledState:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagNotifier.m:672
#11 0x0000000104aeb620 in -[BugsnagNotifier internalClientNotify:withData:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/BugsnagNotifier.m:593
#12 0x0000000104adcde8 in +[Bugsnag internalClientNotify:withData:block:] at .../cocoa/vendor/bugsnag-cocoa/Source/Bugsnag.m:183
#13 0x0000000104aeef4c in -[BugsnagReactNative notify:resolve:reject:] at .../cocoa/BugsnagReactNative.m:168
#14 0x000000019aaa3554 in __invoking___ ()
#15 0x000000019a96c080 in -[NSInvocation invoke] ()
#16 0x000000019a96cc98 in -[NSInvocation invokeWithTarget:] ()
#17 0x0000000105dbbbbc in ::-[RCTModuleMethod invokeWithBridge:module:arguments:](RCTBridge *, id, NSArray *) at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/Base/RCTModuleMethod.mm:569
#18 0x0000000105dbdcdc in facebook::react::invokeInner(RCTBridge*, RCTModuleData*, unsigned int, folly::dynamic const&) at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:108
#19 0x0000000105dbda40 in facebook::react::RCTNativeModule::invoke(unsigned int, folly::dynamic&&, int)::$_0::operator()() const [inlined] at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:73
#20 0x0000000105dbda14 in ::___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke() at /Users/derrickhathaway/Code/LO/Mobile/life-extend/node_modules/react-native/React/CxxModule/RCTNativeModule.mm:65
#21 0x000000019a740b7c in _dispatch_call_block_and_release ()
#22 0x000000019a741fd8 in _dispatch_client_callout ()
#23 0x000000019a748450 in _dispatch_lane_serial_drain ()
#24 0x000000019a748e7c in _dispatch_lane_invoke ()
#25 0x000000019a751f20 in _dispatch_workloop_worker_thread ()                                                                                                                                                 
#26 0x000000019a7a76d0 in _pthread_wqthread ()

Environment

Library versions:

  • cocoapods version (if any) (pod -v): 1.8.4

  • iOS/Android version(s): iOS 13.3.1 (did not seem to manifest on iOS 12)

  • simulator/emulator or physical device?: Physical Device (iPhone XS Max, iPhone XS, iPhone 8 and others)

  • debug mode or production?: Both

  • (iOS only) [BugsnagReactNative start] is present in the
    application:didFinishLaunchingWithOptions: method in your AppDelegate
    class?

  • (Android only) BugsnagReactNative.start(this) is present in the
    onCreate method of your MainApplication class?

Example code snippet

// when the timing is just right:
errorClient.notify(error)

Here are a few crash logs:
https://gist.github.com/derrh/6bfdd0e26647e3812691b52e86d9609a
https://gist.github.com/derrh/c20a61d0c596e850d47e5b87b921490a
https://gist.github.com/derrh/ee5b91e3ca827d17c1a61044a37d0872

I was not able to reproduce this issue in a minimal scenario due to the challenging timing requirements.

@mattdyoung
Copy link

Hi @derrh

Thanks for the report. We're just looking in to what could be causing this.

Can you share how you're configuring Bugsnag i.e. the full code configuration both in the React Native JS and any iOS native configuration (just strip out your API key)?

Also can you confirm if you've registered any "beforeSend" callbacks to modify the Bugsnag report or have added any additional metadata or breadcrumbs before notifying Bugsnag?

@mattdyoung mattdyoung added the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Mar 12, 2020
@derrh
Copy link
Author

derrh commented Mar 12, 2020

Hello, @mattdyoung, Thanks for the quick reply. We have a very basic config.

const config = new Configuration("<bugsnag api key>")
config.appVersion = version
const errorClient = new Client(config)

We aren't doing anything custom on the native side, and we do not have any beforeSend callbacks registered.

Ping me if you need any more info on this. I'd be happy to help in any way I can. I have a branch where I can pretty regularly reproduce the freeze if that's helpful, not in code I can share unfortunately, but if additional crash logs or stack traces can help let me know.

@mattdyoung mattdyoung removed the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Mar 12, 2020
@mattdyoung
Copy link

@derrh Thanks - we're just trying to see if we can reproduce.

When you say you can regularly reproduce the freeze does that include on an iOS simulator or only on an actual device?

You say the timing of the call to notify is key. Can you shed any light on what your app might be doing at that point?

@derrh
Copy link
Author

derrh commented Mar 13, 2020

@mattdyoung, I don't think we were ever able to reproduce it on the simulator.

In our case it was nearly always freezing when certain images in our app were being rendered. For a while we were trying to determine what was special about these particular images. We knew that removing the images removed the freeze, but we couldn't figure out why. Eventually we realized that the react native Bugsnag thread was stopped in the same place every stack trace.

Something else that may be relevant: It only happens on app launch. It's a few seconds after app launch because several API calls have time to complete and the timing of those API calls was critical to the deadlock (one of those API calls was "notify"-ing), and the others caused the rendering of the images that were acting suspicious. Removing either one of those two actions and the freeze would not happen. In the end we stopped logging that particular error because it was a warning on a benign error that we just wanted to track, but not worth the deadlock of course.

If it would be helpful, I'd be happy to jump on a zoom meeting and step through it for you.

@abigailbramble
Copy link

@derrh thank you for the additional information! Can you please write into [email protected] where we can continue this in more detail and look to set up a zoom meeting if necessary? Thanks!

@drhops
Copy link

drhops commented Apr 3, 2020

@mattdyoung @derrh I'm curious if you were able to confirm this issue? I suspect it could be related to similar behavior we're seeing and interested if there's an update. Thanks!

@bigyelow
Copy link

bigyelow commented Apr 4, 2020

The implementation of AllImages::imageLoadAddressByIndex is different in the latest version and the last version which the former uses the withReadLock block to read the _loadedImages value while the latter doesn’t.

Not sure if this is helpful for your team to investigate. @mattdyoung .

@derrh I guess there was another thread calling a method in AllImages.cpp that uses the withReadLock block which helped result in a deadlock.

@bigyelow
Copy link

bigyelow commented Apr 4, 2020

@derrh Ah, I have seen all the 3 links of your crash logs, they just proves my guess that some other thread(s) called the AllImages:xx method.

@mattdyoung
Copy link

@derrh @bigyelow
Thanks for the info on this. We believe we've identified the issue and are looking at releasing a fix.

@mattdyoung mattdyoung added bug Confirmed bug scheduled Work is starting on this feature/bug labels Apr 6, 2020
@bigyelow
Copy link

bigyelow commented Apr 6, 2020

@mattdyoung Does this bug relate to iOS system or logic in Bugsnag itself?

@mattdyoung
Copy link

@bigyelow We're still working on the issue so not entirely sure yet what the fix will be but even if it's an iOS issue we may be able to work around it in Bugsnag to prevent it.

@tomlongridge tomlongridge removed the scheduled Work is starting on this feature/bug label Apr 15, 2020
@bugsnagbot bugsnagbot added the scheduled Work is starting on this feature/bug label Apr 27, 2020
@abigailbramble
Copy link

Released in v2.23.8.

@abigailbramble abigailbramble added released This feature/bug fix has been released and removed scheduled Work is starting on this feature/bug labels May 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Confirmed bug released This feature/bug fix has been released
Projects
None yet
Development

No branches or pull requests

7 participants