-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault with NLog on linux-arm, related to "Recursive resource lookup bug"? #25403
Comments
This is probably the issue: The default runtime for standalone apps is 2.0.0 if available and it's running against that, which doesn't contain the fix. For standalone apps it's necessary to set this attribute in the .csproj to reference a specific version: <PropertyGroup>
<RuntimeFrameworkVersion>2.0.5</RuntimeFrameworkVersion>
</PropertyGroup> Or |
In other words standalone apps don't "float forward" even for patch versions. They really are isolated in that way. |
Oops - I missed that you already got that far... |
So with the 2.0.5 specified, maybe the segfault is something unrelated. CAn you get a stack trace under gdb or lldb? |
Here you go:
bt gives us:
I'll have to look into how to install/use lldb for more details. |
Ah that is https://github.com/dotnet/coreclr/issues/14997 which is in the servicing branch but has not shipped yet as there was a delay. I hope it goes out next month. @brianrob is there a workaround for this? It sounds like @BrendanGrant is getting it consistently on Raspberry PI |
It should be in 2.0.7 BTW. 2.0.6 is expected to go out shortly. |
Thanks @danmosemsft, it sounds related and would fix my dummy repro case... but it seems my larger app remains borked on the Pi, even when setting the RuntimeFrameworkVersion to 2.0.5. Unlike the repro which just seg faults semi quietly, the added new ArgumentException() near the top of Main() still causes the giant [Recursive resource lookup bug] related stack trace... followed by a terminal seg fault. If I don't have the new ArgumentException(), the program will fail about 20 seconds later when another bit of code tries to do it's thing. The entire thing in gdb looks like:
The startup code is remaribily simple:
The first logger message will be dumped to the console, but the second one never does... unlike the mentioned repro. I agree that https://github.com/dotnet/coreclr/issues/14997 is a candidate for blame wrt the seg fault.. there is still the oddity of the stack trace causing the terminal condition in the first place. |
@kouvel can you help make sense of this? Wasnt it you who likes at the original Issue? |
Oh what culture is in play here? What is cultureinfo.CurrentCulture? I seem to remember the workaround only helped in en-US |
en-GB is the CurrentCulture, as it's largely a stock Rasbian install. I used raspi-config to change the default culture... but that didn't seem to stick, I've not tried a reboot yet. I manually set CultureInfo.CurrentCulture = new CultureInfo("en-US") before the new ArgumentException() and it still fell over. On a lark, I also set CultureInfo.CurrentUICulture = new CultureInfo("en-US") immediately afterwards. That was not the workaround I was expecting. Worse, this sounds like https://github.com/dotnet/corefx/issues/23608 which is reported to be fixed in 2.0.3... regression? |
From the original Issue :
I'm guessing your static field initializer that does Get logger is doing Load from. Can you make it happen after the Argument exception? Keep setting the en-US before that also. |
@tarekgh may also be able to help |
Currently the init of the program is (with workarounds and extra logging):
Commenting things out I get the following:
And sure enough it works... while still reporting en-GB as both culture settings. Also works if I set On the plus side... we've two different work-around... both plenty odd.
|
Hopefully this is enough until 2.0.7. Or you can use 2.1 preview 1 which should have neither issue... |
@BrendanGrant from the stack above it looks like the larger app is still running against an older version of the runtime. Based on the path to libcoreclr.so in the output above strings /.../libcoreclr.so | grep '@(#)' To see the version and coreclr repo commit hash, that would help to identify which runtime it's running against. |
The fix for the infinite resource lookup recursion issue should be in 2.0.3 and above. |
I see what the problem is @kouvel... and I think it's a bug... but of another sort. Strings tells me: Which i believe maps to the 2.0.0 release. The file on my local system is dated 7/20/2017, despite me using 2.0.5 on the app... but only one of them. Apparently important is the fact that the solution I am building has 7 projects, 4 of which are executables... and only one of which did I apply the RuntimeFrameworkVersion override to. It seems that when publishing the solution, the expected RuntimeFrameworkVersion of the last project alphabetically(?) will override/overwrite any previously set framework, resulting in an old version. Annoyingly... this also happens silently, not unlike when two different projects have different versions specified for a NuGet package. |
Adding the following to the end of my dotnet publish command seems a more safe/obvious way to work around this silent downgrade:
|
Yes that commit hash maps to the 2.0.0 runtime. That does sound like a bug to me, imo it should not be that easy to accidentally run against an older runtime. Filed https://github.com/dotnet/cli/issues/8776 for this specific issue. |
Is there any further action here or has this been resolved and we can close it? |
I believe this can be closed as there is no farther action needed here. |
Agreed, fingers crossed on https://github.com/dotnet/cli/issues/8776 getting fixed. |
I had a dotnet core 2.0 app running fine on a Raspberry Pi 2 (running Raspbian) fine for many months, until the sd card gave up, so setup a new card with a fresh install of Raspbian & deployed an upgraded version (still targeting 2.0) with upgrades to the latest NuGet projects suddenly caused it to fail with the following error:
This sounded rather similar to https://github.com/dotnet/corefx/issues/23608 (just one of many though), and using it's advice I created a new ArgumetnException() in my main to trigger it faster (as seen above in stacktrace).
Whittling things down into a standalone repro... I build a new stand alone project, added a NuGet reference to NLog.Extensions.Logging, added a NLog.config to the project which always copies to output directory, with the following contents:
Then updated the Program.cs file to be the following:
When I build/publish this demo app with:
dotnet publish --runtime linux-arm --configuration Release
To my Pi2... it fails with the above stacktrace. If I follow the advice of https://github.com/dotnet/corefx/issues/26292 and add the bolded text to the rather simple csproj:
There is a slight improvement... it still suffers from a segmentation fault, only you don't see the stack trace.
Attached is the rather simplistic repro, which of course works fine on my Windows 10 machine, but fails on Raspbian GNU/Linux 9 (stretch).
Build environment details:
coreclr-segfault-with-nlog-on-linux-arm.zip
The text was updated successfully, but these errors were encountered: