Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler\multiple test failed #100114

Closed
jkotas opened this issue Mar 22, 2024 · 16 comments
Closed

profiler\multiple test failed #100114

jkotas opened this issue Mar 22, 2024 · 16 comments
Assignees
Labels
area-Diagnostics-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Mar 22, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=655183
Build error leg or test failing: profiler\multiple\multiple\multiple.cmd
Pull request: #101291

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": ["win-arm64", "profiler\\multiple\\multiple\\multiple.cmd [FAIL]"],
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎
Result validation: ⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: 🔎" line.
Validation performed at: 4/25/2024 9:12:24 PM UTC

Report

Build Definition Test Pull Request
667308 dotnet/runtime profiler.multiple.WorkItemExecution #101925

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 1
@jkotas jkotas added area-ExceptionHandling-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Mar 22, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 22, 2024
@jkotas
Copy link
Member Author

jkotas commented Mar 22, 2024

This is crashing in new EH:

0b 00000032`5797bd90 00007ffa`cd1a213c     ntdll!RtlDispatchException+0x200 [minkernel\ntos\rtl\arm64\exdsptch.c @ 720] 
0c 00000032`5797c3d0 00007ffa`c8ef8274     ntdll!RtlRaiseException+0xbc [minkernel\ntos\rtl\arm64\raise.c @ 123] 
0d 00000032`5797c8b0 00007ffa`756d4be8     KERNELBASE!RaiseException+0x54 [minkernel\kernelbase\xcpt.c @ 936] 
0e 00000032`5797c960 00007ffa`742f2160     coreclr!SfiNext+0x308 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 8432] 
0f 00000032`5797cad0 00007ffa`742f1678     System_Private_CoreLib!System.Runtime.EH.DispatchEx(System.Runtime.StackFrameIterator ByRef, ExInfo ByRef)+0x6a0 [/_/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 861] 
10 00000032`5797cc30 00007ffa`753b1e54     System_Private_CoreLib!System.Runtime.EH.RhThrowEx(System.Object, ExInfo ByRef)+0x68 [/_/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs @ 642] 
11 00000032`5797cc50 00007ffa`7552f9c8     coreclr!CallDescrWorkerInternal+0x84 [D:\a\_work\1\s\artifacts\obj\coreclr\windows.arm64.Checked\vm\wks\CallDescrWorkerARM64.asm @ 4989] 
12 00000032`5797cc70 00007ffa`755301e0     coreclr!CallDescrWorkerWithHandler+0x128 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 67] 
13 00000032`5797ccc0 00007ffa`756c9280     coreclr!DispatchCallSimple+0xe0 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221] 
14 00000032`5797cda0 00007ffa`756c9320     coreclr!DispatchManagedException+0x1e0 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 5645] 
15 00000032`5797e0a0 00007ffa`755ba154     coreclr!DispatchManagedException+0x40 [D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp @ 5662] 
16 00000032`5797e460 00007ffa`755abb14     coreclr!ThrowNew+0xe4 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 4229] 
17 00000032`5797e4c0 00007ffa`15f0e534     coreclr!IL_Throw+0xb4 [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 4259] 

@lewing
Copy link
Member

lewing commented Apr 24, 2024

@janvorli has this been fixed?

@janvorli
Copy link
Member

@lewing I believe I have looked into the original and it was actually not an issue of the new EH, but rather a regular unhandled exception or something along these lines. Unfortunately, the only two remaining failures from the list above are misclassified, so I cannot double check it now.

@jkotas
Copy link
Member Author

jkotas commented Apr 24, 2024

Either way, CI does not hit this anymore, so the issue can be closed.

@jkotas jkotas closed this as completed Apr 24, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Apr 24, 2024
@jkotas jkotas reopened this Apr 25, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2024
@janvorli
Copy link
Member

@jkotas the failure in the referenced run is just an unhandled exception. From the test log:

      Output:
        Unhandled exception. System.Exception: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above.
           at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
           at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
           at Profiler.Tests.MultiplyLoaded.Main(String[] args)

The last message from the actual test is

       Waiting for profilers to all detach

So that sounds like profiler or test related issue.

@jkotas
Copy link
Member Author

jkotas commented Apr 25, 2024

Ah ok, I have not realized that this is the new callstack for unhandled exception. I agree that it looks like a profiler test issue. We would need a crash dump of the hung process to diagnose it.

@jkotas jkotas added area-Diagnostics-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' and removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' labels Apr 25, 2024
Copy link
Contributor

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member Author

jkotas commented Apr 26, 2024

Based on the dump, the child process exit code is 0:

0:000> !do 000001f8`7080afc8 
Name:        System.Diagnostics.Process
MethodTable: 00007ffbf63dca00

...
00007ffbf5c4f900  40000dd       d8         System.Int32  1 instance                0 _exitCode
...

It suggests that the child process exited without crashing. The problem seems to be in the parsing of the child process output.

@tommcdon
Copy link
Member

@jkotas do we expect this issue to be fixed by #101607?

@jkotas
Copy link
Member Author

jkotas commented Apr 29, 2024

Missing volatile that I have added in #101607 can explain failures with these symptoms. I suspect that my change is not going to fix the failures. The test has been failing too frequently for missing volatile to be the root cause of the failures.

I would keep this issue open for 2 weeks. If we do not see any new hits in 2 weeks, we can declare it fixed.

@tommcdon tommcdon added this to the 9.0.0 milestone Apr 29, 2024
@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Apr 29, 2024
@tommcdon tommcdon removed their assignment Apr 30, 2024
@v-wenyuxu
Copy link

Failed in: runtime-coreclr r2r 20240429.1

Failed tests:

R2R-CG2 windows arm64 Checked no_tiered_compilation @ Windows.11.Arm64.Open
    - profiler\\multiple\\multiple\\multiple.cmd

Error message:

 Unhandled exception. System.Exception: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above. Profilee returned exit code 100.
   at Profiler.Tests.ProfilerTestRunner.FailFastWithMessage(String error)
   at Profiler.Tests.ProfilerTestRunner.Run(String profileePath, String testName, Guid profilerClsid, String profileeArguments, ProfileeOptions profileeOptions, Dictionary`2 envVars, String reverseServerName, Boolean loadAsNotification, Int32 notificationCopies)
   at Profiler.Tests.MultiplyLoaded.Main(String[] args)

Return code:      1
Raw output file:      C:\h\w\C29B0A72\w\AB5E08EF\uploads\Reports\profiler.multiple\multiple\multiple.output.txt
Raw output:
BEGIN EXECUTION
eventpipe_common.dll
Microsoft.Diagnostics.NETCore.Client.dll
multiple.dll
Profiler.dll
profiler_common.dll
TestLibrary.dll
        6 file(s) copied.
 6:55:01.56
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\eventpipe_common.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C29B0A72\p\crossgen2\crossgen2.dll" @"C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.rsp"   -r:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\*.dll"
Emitting R2R PE file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll
" "dotnet" "C:\h\w\C29B0A72\p\r2rdump\r2rdump.dll" --header --sc --in C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll --out C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\eventpipe_common.dll.r2rdump --val"
 6:55:06.92
 6:55:06.92
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\Microsoft.Diagnostics.NETCore.Client.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C29B0A72\p\crossgen2\crossgen2.dll" @"C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.rsp"   -r:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\*.dll"
Emitting R2R PE file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll
" "dotnet" "C:\h\w\C29B0A72\p\r2rdump\r2rdump.dll" --header --sc --in C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll --out C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Microsoft.Diagnostics.NETCore.Client.dll.r2rdump --val"
 6:55:10.09
 6:55:10.11
Response file: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\multiple.dll.rsp
C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\IL-CG2\multiple.dll
-o:C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\multiple.dll
--targetarch:arm64
--targetos:windows
--verify-type-and-field-layout
--method-layout:random
-r:C:\h\w\C29B0A72\p\System.*.dll
-r:C:\h\w\C29B0A72\p\Microsoft.*.dll
-r:C:\h\w\C29B0A72\p\xunit.*.dll
-r:C:\h\w\C29B0A72\p\mscorlib.dll
-r:C:\h\w\C29B0A72\p\netstandard.dll
" "dotnet" "C:\h\w\C2

Stack trace:

   at profiler_multiple._multiple_multiple_._multiple_multiple_cmd()
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

@davmason
Copy link
Member

davmason commented May 8, 2024

The latest failure doesn't make sense to me

      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::Initialize
      Profiler.dll!Profiler::Initialize
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::Initialize
      Profiler.dll!Profiler::Initialize
      Profiler path: C:\h\w\C29B0A72\w\AB5E08EF\e\profiler\multiple\multiple\Profiler.dll
      Profiler.dll!DllGetClassObject
      MultiplyLoaded::InitializeForAttach
      Profiler.dll!Profiler::Initialize
      Throwing exception
      Exception caught
      Waiting for profilers to all detach
      
      Test failed: Profiler tests are expected to contain the text 'PROFILER TEST PASSES' in the console output of the profilee app to indicate a passing test. Usually it is printed from the Shutdown() method of the profiler implementation. This text was not found in the output above. Profilee returned exit code 100.
      Expected: 100
      Actual: -532462766
      END EXECUTION - FAILED
      FAILED
      Test failed. Trying to see if dump file was created in C:\cores since 4/30/2024 6:55:01 AM

The profilee returned code 100 which means that the _profilerDone.WaitOne call succeeded, but we should only set the event after seeing all three profilers successfully detach. There are no logs indicating any profiler attached however. My only thought is that somehow we are just missing output from the profiled process, but we call Process.WaitForExit() which says in the docs that all IO is guaranteed to occur before it returns.

@jkotas
Copy link
Member Author

jkotas commented May 8, 2024

A good question to ask is why this test and why arm64 only. It all points to bug(s) related to arm64 weaker memory model in the test code or in the code specifically exercised by this test.

The missing volatile that I have added in https://github.com/dotnet/runtime/pull/101607/files#diff-667a604f90dafe4391d86c87780a736aa2eb10010f194966110baeb141d05370R198 was a bug like that. My guess is that we have more of them.

@davmason
Copy link
Member

I fixed another issue and added a speculative fix in #102066, for now I am going to wait to see if it fails again before investigating any further

@tommcdon
Copy link
Member

tommcdon commented Jun 5, 2024

There have been no issues reported by automation since this issue has been fixed, closing as "fixed"

@tommcdon tommcdon closed this as completed Jun 5, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

6 participants