Skip to content
This repository was archived by the owner on Nov 1, 2020. It is now read-only.

Conversation

@christianscheuer
Copy link
Contributor

Work in progress.

I have done a workaround for the missing OSX unwind as you suggested, @jkotas (see #1867).
I might be in a little deep here though, so forgive me if I have misunderstood anything.

Now when I run a real-world OSX app on this, it stops in RhpCallCatchFunclet:
frame #0: 0x000000010005fad2 cptsbackendRhpCallCatchFunclet + 63`

At this point:

    0x10005fad2 <+63>: movq   (%rax), %rbx
    0x10005fad5 <+66>: movq   0x20(%rdx), %rax
    0x10005fad9 <+70>: movq   (%rax), %rbp
    0x10005fadc <+73>: movq   0x58(%rdx), %rax

Where the value of rax is 0x0.

It seems to be this code in RhpCallCatchFunclet

        mov     rax, [r8 + OFFSETOF__REGDISPLAY__pRbx]
        mov     rbx, [rax]

Where the RegDisplay's pRbx is 0x0.

I don't know if that happens due to a misalignment of offsets, if I did something wrong in other parts of the code, or if this is related to the original todo-item.

@dnfclas
Copy link

dnfclas commented Nov 8, 2016

Hi @christianscheuer, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla2.dotnetfoundation.org.

TTYL, DNFBOT;

@dnfclas
Copy link

dnfclas commented Nov 8, 2016

@christianscheuer, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@jkotas
Copy link
Member

jkotas commented Nov 8, 2016

Where the value of rax is 0x0.

@janvorli Apparently, there is number of places where we expect to have pointers to registers. Could you please take a look and see whether it is worth it to try to fix all of them as a workaround, instead of just fixing the unwinder?

@christianscheuer
Copy link
Contributor Author

@jkotas Let me know if there is anything I can do to help push this. Right now this issue is blocking a lot of progress for me personally, hence my interest in seeking to resolve it before the March milestone. I'll be happy to invest considerable time in it, but I would need some guidance to get started :)

@janvorli
Copy link
Member

janvorli commented Nov 9, 2016

@jkotas, @christianscheuer what could work as a temporary workaround for now is to revert the change that @christianscheuer made and then:

  • stick PAL_LIMITED_CONTEXT into the REGDISPLAY for OSX only
  • update all necessary offset constants
  • in the UnixContext.cpp, in UnwindCursorToRegDisplay function, set all the pointers in REGDISPLAY to point to the corresponding members in the PAL_LIMITED_CONTEXT and add for OSX unw_get_reg for all the registers that have pointers in the REGDISPLAY.

@christianscheuer
Copy link
Contributor Author

Great, thank you @janvorli. I'll give that a shot asap.

@christianscheuer
Copy link
Contributor Author

@janvorli PTLA.
I'm unsure if this is working as expected. I'm now getting an "Abort trap: 6" fatal error, with this backtrace:

frame #0: 0x00007fff91eda286 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff8b10b9f9 libsystem_pthread.dylib`pthread_kill + 90
frame #2: 0x00007fff8f8219ab libsystem_c.dylib`abort + 129
frame #3: 0x000000010005a279 cptsbackend`RaiseFailFastException + 9
frame #4: 0x000000010006808c cptsbackend`System_Private_CoreLib_System_Runtime_EH__FailFastViaClasslib + 48
frame #5: 0x000000010006ef2a cptsbackend`System_Private_CoreLib_System_Runtime_EH__UnhandledExceptionFailFastViaClasslib + 98

@janvorli
Copy link
Member

janvorli commented Nov 9, 2016

@christianscheuer - does this mean you were not getting the stack trace? The abort is expected to happen at the end, since unhandled exception is supposed to fail fast.
@jkotas should we be getting stack trace with your change?
In case we should be getting the stack trace, from the call stack shown above, it seems that in the UnhandledExceptionFailFastViaClasslib, the pFailFastFunction extracted using the RhpGetClasslibFunction was IntPtr.Zero, so maybe that would be the next problem here.
@christianscheuer can you set a breakpoint to RhpGetClasslibFunction and see whether we haven't found a code manager or whether the code manager hasn't found the classlib function?

@jkotas
Copy link
Member

jkotas commented Nov 9, 2016

I'm now getting an "Abort trap: 6" fatal error, with this backtrace

This is expected. You should also see message with exception type and message on the console. The stacktrace printing on unhandled exceptions does not work yet.

@christianscheuer
Copy link
Contributor Author

@janvorli Yes this is an exception that when compiled in CoreCLR is caught by a catch block and logged to console so the program can continue. With these bits in CoreRT it aborts the program. So it seems the catch handler is not called, making the exception bubble up.

@christianscheuer
Copy link
Contributor Author

@jkotas All I get in the console is:

Abort trap: 6


#else // !UNIX_AMD64_ABI

#if defined(__APPLE__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you please change the define to something like UNWIND_WORKAROUND that is defined for __APPLE__ so that all places related to this are easy to find?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Which header file would be the appropriate place to make this define?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src\Native\CMakeLists.txt may be a good place

@janvorli
Copy link
Member

janvorli commented Nov 9, 2016

Hmm, we may need to set the register pointers in the UnwindCursorToRegDisplay little bit differently. There seem to be cases when the pointers were actually pointing to some location and we want to preserve that. E.g. the pRBP is set in the StackFrameIterator::InternalInit to point to a member in the TransitionFrame or the others pointing to the initial PAL_LIMITED_CONTEXT. Or we set them to values from thisFuncletPtrs at some point.
So, it seems that we should also ifdef-out the following piece of code in the UnwindCursorToRegDisplay for the UNWIND_WORAROUND and change the setting of the pointers so that we set them only if they were NULL.

#define GET_CONTEXT_POINTER(unwReg, rdReg) GetContextPointer(cursor, unwContext, unwReg, &regDisplay->p##rdReg);
    GET_CONTEXT_POINTERS
#undef GET_CONTEXT_POINTER

Hmm, actually, now that I think about it, I've just realized that the context pointers are always initialized at the beginning of the stack walk to point inside the initial PAL_LIMITED_CONTEXT (or the TransitionFrame).
So it seems that all we actually need to do is to ifdef out the above mentioned block. No other changes would be necessary. I am sorry for not realizing this earlier.

@jkotas
Copy link
Member

jkotas commented Nov 9, 2016

Also, you can try setting breaking at RhpThrowEx and RhpRethrow, and check whether you see good stacktrace under debugger. There is a bug in generation of unwind info on Unix: #1461 - the exception handling won't work through methods affected by this bug.

add_compile_options(-fPIC)
add_compile_options(-fvisibility=hidden)

if(CLR_CMAKE_PLATFORM_DARWIN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the workaround is limited to a single place now, it is not very valueable to have it controlled via special define...

@jkotas
Copy link
Member

jkotas commented Nov 10, 2016

@christianscheuer Is this still WIP? Are the crashes that you have been running into before fixed now?

@christianscheuer
Copy link
Contributor Author

@jkotas I am currently getting a new error which relates to this code using an await statement in a try block. I am trying to isolate it to see if this works as expected when not combined with the Task async/await pattern.

@christianscheuer
Copy link
Contributor Author

I removed the Task pattern complexity from my code. It seems now that the stack's backtrace show correctly, but that it somehow resumes from the wrong address?

I am now getting an error here:

  * frame #0: 0x0000000100001d5f cptsbackend`__VirtualCall_src_Creatix_PTShortcuts_AxNodes_AxElement__get_UIElement + 3
    frame #1: 0x000000010016d6b5 cptsbackend`src_Creatix_PTShortcuts_Actions_ClickButtonAction__Execute + 129
    frame #2: 0x0000000100137138 cptsbackend`src_Creatix_PTShortcuts_Actions_AutoAction__Run + 80
...

This virtualcall is the argument Element.UIElement (a property call) in the following code:

result.Require(Element.UIElement, "ClickButtonAction requires UIElement");

result.Require(...) is the throwing method.
So we should not end up in the UIElement's virtualcall after the exception.

Any ideas?

@christianscheuer
Copy link
Contributor Author

This new issue I was encountering seems to be the same you are hitting here, @jkotas: #2254

@christianscheuer
Copy link
Contributor Author

I can now confirm that @jkotas' fix for #2254, #2271, solves the related problem I was encountering. So this workaround now should be good to go. I have simplified the ifdef again.

@jkotas
Copy link
Member

jkotas commented Nov 26, 2016

@christianscheuer Thank you for the confirmation.

@janvorli is working on the full fix for this problem (statically link a custom version of the unwinder with the runtime), but it does not hurt to have this workaround in master until that comes online.

@jkotas jkotas changed the title [WIP] OSX unwind workaround OSX unwind workaround Nov 26, 2016
@jkotas jkotas merged commit 0d054ad into dotnet:master Nov 26, 2016
@christianscheuer christianscheuer deleted the osx-unwind-workaround branch November 27, 2016 16:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants