Runtime jobs being abandoned due to infra #50746

runfoapp · 2021-04-05T21:07:52Z

Runfo Tracking Issue: Runtime jobs being abandoned due to infra

Definition	Build	Kind	Job Name
runtime	1079569	Rolling	Mono Product Build windows x86 debug
runtime	1079569	Rolling	CoreCLR Product Build windows x64 checked
runtime	1079569	Rolling	CoreCLR Product Build windows x86 checked
runtime	1079569	Rolling	CoreCLR Product Build windows x64 release PGO
runtime	1079569	Rolling	Libraries Build windows x86 Release
runtime	1079569	Rolling	CoreCLR Product Build windows x86 release
runtime	1079569	Rolling	Libraries Build windows net48 x64 Release
runtime	1079569	Rolling	Libraries Build windows allConfigurations x64 Release
runtime	1079569	Rolling	Libraries Build windows x64 Release
runtime	1079569	Rolling	CoreCLR Product Build windows x64 release
runtime	1079569	Rolling	CoreCLR Product Build windows arm64 checked
runtime	1079569	Rolling	Build windows x64 Release SingleFile
runtime	1079569	Rolling	Libraries Build windows arm64 Release
runtime	1079569	Rolling	Mono Product Build windows x64 debug
runtime	1079569	Rolling	CoreCLR Product Build windows arm checked
runtime	1079569	Rolling	Libraries Build windows net48 x86 Release
runtime	1079569	Rolling	Mono Product Build windows x64 release
runtime	1079569	Rolling	Libraries Build windows arm Release
runtime	1079569	Rolling	Mono Product Build windows x86 release
runtime	1079569	Rolling	CoreCLR Product Build windows arm release
runtime	1079569	Rolling	CoreCLR Product Build windows arm64 release
runtime	1079550	PR 50489	CoreCLR Product Build windows arm release
runtime	1079550	PR 50489	CoreCLR Product Build windows x64 release
runtime	1079550	PR 50489	CoreCLR Product Build windows x86 release
runtime	1079550	PR 50489	CoreCLR Product Build windows x64 release PGO
runtime	1079550	PR 50489	CoreCLR Product Build windows arm64 release
runtime	1079545	PR 50894	Mono Product Build windows x64 release
runtime	1079545	PR 50894	Libraries Build windows net48 x86 Release
runtime	1079545	PR 50894	CoreCLR Product Build windows x86 checked
runtime	1079545	PR 50894	Libraries Build windows allConfigurations x64 Debug
runtime	1079545	PR 50894	CoreCLR Product Build windows x64 release PGO
runtime	1079545	PR 50894	Libraries Build windows x86 Release
runtime	1079545	PR 50894	CoreCLR Product Build windows x86 release
runtime	1079545	PR 50894	Libraries Build windows x86 Debug
runtime	1079545	PR 50894	CoreCLR Product Build windows x64 release
runtime	1079545	PR 50894	CoreCLR Product Build windows arm release
runtime	1079545	PR 50894	Libraries Build windows x64 Debug
runtime	1079545	PR 50894	Mono Product Build windows x86 release
runtime	1079545	PR 50894	CoreCLR Product Build windows arm64 release
runtime	1079545	PR 50894	CoreCLR Product Build windows x64 checked
runtime	1079545	PR 50894	CoreCLR Product Build windows arm64 checked
runtime	1079545	PR 50894	Mono Product Build windows x86 debug
runtime	1079545	PR 50894	Build windows x64 Release SingleFile
runtime	1079545	PR 50894	Libraries Build windows arm64 Release
runtime	1079545	PR 50894	Mono Product Build windows x64 debug
runtime	1079545	PR 50894	Libraries Build windows arm Release
runtime	1079545	PR 50894	CoreCLR Product Build windows arm checked
runtime	1079530	PR 50986	Mono Product Build windows x64 release
runtime	1079530	PR 50986	CoreCLR Product Build windows x86 release
runtime	1079530	PR 50986	CoreCLR Product Build windows arm release
runtime	1079530	PR 50986	CoreCLR Product Build windows x64 release
runtime	1079530	PR 50986	Libraries Build windows x86 Debug
runtime	1079530	PR 50986	Libraries Build windows x86 Release
runtime	1079530	PR 50986	CoreCLR Product Build windows x64 release PGO
runtime	1079530	PR 50986	Libraries Build windows allConfigurations x64 Debug
runtime	1079530	PR 50986	CoreCLR Product Build windows x86 checked
runtime	1079530	PR 50986	Build windows x64 Release SingleFile
runtime	1079530	PR 50986	CoreCLR Product Build windows arm checked
runtime	1079530	PR 50986	Mono Product Build windows x64 debug
runtime	1079530	PR 50986	Libraries Build windows arm64 Release
runtime	1079530	PR 50986	Mono Product Build windows x86 debug
runtime	1079530	PR 50986	CoreCLR Product Build windows arm64 checked
runtime	1079530	PR 50986	CoreCLR Product Build windows x64 checked
runtime	1079530	PR 50986	CoreCLR Product Build windows arm64 release
runtime	1079530	PR 50986	Mono Product Build windows x86 release
runtime	1079530	PR 50986	Libraries Build windows x64 Debug
runtime	1079530	PR 50986	Libraries Build windows net48 x86 Release
runtime	1079530	PR 50986	Libraries Build windows arm Release
runtime	1079523	PR 50817	Libraries Build windows x64 Debug
runtime	1079523	PR 50817	Libraries Build windows x86 Debug
runtime	1079523	PR 50817	Libraries Build windows x86 Release
runtime	1079523	PR 50817	Libraries Build windows allConfigurations x64 Debug
runtime	1079385	PR 50954	Installer Build and Test coreclr windows_x86 Debug
runtime	1079385	PR 50954	CoreCLR Pri0 Runtime Tests Run windows x86 checked
runtime	1079324	Rolling	Mono Product Build windows x86 release
runtime	1079324	Rolling	Libraries Build windows arm Release
runtime	1079324	Rolling	Mono Product Build windows x64 release
runtime	1079324	Rolling	Libraries Build windows net48 x86 Release
runtime	1079324	Rolling	Mono Product Build windows x64 debug
runtime	1079324	Rolling	Libraries Build windows arm64 Release
runtime	1079324	Rolling	Build windows x64 Release SingleFile
runtime	1079324	Rolling	Libraries Build windows x64 Release
runtime	1079324	Rolling	Libraries Build windows allConfigurations x64 Release
runtime	1079324	Rolling	Libraries Build windows net48 x64 Release
runtime	1079324	Rolling	Libraries Build windows x86 Release
runtime	1079324	Rolling	Mono Product Build windows x86 debug
runtime	1079065	PR 50569	Mono Product Build windows x86 release
runtime	1078839	PR 50489	Mono Product Build windows x64 debug
runtime	1078839	PR 50489	Libraries Build windows arm64 Release
runtime	1078839	PR 50489	Libraries Build windows allConfigurations x64 Debug
runtime	1078839	PR 50489	CoreCLR Product Build windows x64 release PGO
runtime	1078839	PR 50489	Libraries Build windows x86 Release
runtime	1078839	PR 50489	CoreCLR Product Build windows x86 release
runtime	1078839	PR 50489	Libraries Build windows x86 Debug
runtime	1078839	PR 50489	CoreCLR Product Build windows x64 release
runtime	1078839	PR 50489	Libraries Build windows net48 x86 Release
runtime	1078839	PR 50489	Mono Product Build windows x64 release
runtime	1078839	PR 50489	Libraries Build windows arm Release
runtime	1078839	PR 50489	Libraries Build windows x64 Debug
runtime	1078839	PR 50489	Mono Product Build windows x86 release

Build Result Summary

Day Hit Count	Week Hit Count	Month Hit Count
7	9	9

dotnet-issue-labeler · 2021-04-05T21:07:55Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2021-04-05T21:26:14Z

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

Runfo Tracking Issue: Runtime jobs being abandoned due to infra

Definition	Build	Kind	Job Name
runtime	1072700	PR 50622	Mono Product Build windows x64 debug
runtime	1072700	PR 50622	Libraries Build windows arm64 Release
runtime	1072700	PR 50622	Libraries Build windows allConfigurations x64 Debug
runtime	1072700	PR 50622	CoreCLR Product Build windows x64 release PGO
runtime	1072700	PR 50622	Libraries Build windows x86 Release
runtime	1072700	PR 50622	CoreCLR Product Build windows x86 release
runtime	1072700	PR 50622	Libraries Build windows x86 Debug
runtime	1072700	PR 50622	CoreCLR Product Build windows x64 release
runtime	1072700	PR 50622	Libraries Build windows net48 x86 Release
runtime	1072700	PR 50622	Mono Product Build windows x64 release
runtime	1072700	PR 50622	Libraries Build windows arm Release
runtime	1072700	PR 50622	Libraries Build windows x64 Debug
runtime	1072700	PR 50622	Mono Product Build windows x86 release
runtime	1072700	PR 50622	CoreCLR Product Build windows arm64 release
runtime	1072700	PR 50622	Mono Product Build windows x86 debug
runtime	1072700	PR 50622	CoreCLR Product Build windows arm release
runtime	1072700	PR 50622	Build windows x64 Release SingleFile

Build Result Summary

Day Hit Count	Week Hit Count	Month Hit Count
1	1	1

Author:	runfoapp[bot]
Assignees:	-
Labels:	`area-Infrastructure`, `blocking-clean-ci`, `untriaged`
Milestone:	-

jkoritzinsky · 2021-04-05T21:26:17Z

This has started popping up again. @dotent/dnceng

safern · 2021-04-05T21:39:33Z

cc: @adiaaida this is the same issue that I shared on the FR channel. Happening not only on windows.

michellemcdaniel · 2021-04-05T22:18:09Z

Got it. Thanks.

@dotnet/dnceng I have opened https://github.com/dotnet/core-eng/issues/12732 to track this on our side

michellemcdaniel · 2021-04-05T22:26:47Z

@safern These all appear to be running on windows. Is that incorrect?

safern · 2021-04-06T00:46:39Z

Sorry I miss read somehow, yes they are all windows 🤦

jakubstilec · 2021-04-06T09:57:22Z

adding @lukas-lansky

lukas-lansky · 2021-04-06T12:07:13Z

Let's look!

https://dev.azure.com/dnceng/public/_build/results?buildId=1072700&view=results leads to
https://dev.azure.com/dnceng/public/_apis/build/builds/1072700 mentions orchestration plan ID a5a7765f-b742-4770-84db-29874e5f8827 and that leads to
Jobs | where Started > ago(10d) | where Properties contains "a5a7765f-b742-4770-84db-29874e5f8827" and Source contains "coreclr__product_build_windows_x64_release." leads to
JobId 14419096, WorkItem 606751917: Queued 2021-04-05T16:11:45.094Z, Started 2021-04-05T18:40:22.987Z
QueueName is buildpool.windows.10.amd64.vs2019.open
Grafana says:
... and this is probably what made @adiaaida to suspect Autoscaler, right. Why so little machines for the whole day given the enormous wait times? @ulisesh, can you hint where to look further?

michellemcdaniel · 2021-04-06T14:19:41Z

It was actually Ulises who suspected the scaler. When we saw this was happening, he immediately manually scaled up the queue, and that's why you see that drop off in wait time.

trylek · 2021-04-06T19:47:39Z

Hmm, so is the issue supposed to be mitigated? It seems to me that all Windows legs in PR / CI runs are stuck right now, is that just a backlog caused by the previous slowdown or has the problem reappeared even with the upscaled queue?

ilyas1974 · 2021-04-06T20:22:55Z

It appears there was an issue with the underlying service fabric framework that has been mitigated. We trying to manually scale this queue.

trylek · 2021-04-06T20:40:15Z

Thanks Ilya for the clarification; I have also realized that my previous formulation was kind of selfish, what I meant to say was that "all Windows legs in my PR / CI runs are stuck right now" and that continues to be the case, according to your comment for now I just hope that thanks to your manual adjustments the backlog will eventually disappear unless you advise me to take some proactive measures like abandoning my currently running tests and triggering new ones.

ulisesh · 2021-04-06T21:33:14Z

I don't think HelixProd scalesets have been fixed, I keep getting error trying to scale them up. @ilyas1974 @adiaaida we should create an IcM

The affected scalesets are buildpool.windows.10.amd64.vs2017.open-a-scaleset and buildpool.windows.10.amd64.vs2019.open-a-scaleset

ilyas1974 · 2021-04-06T21:45:24Z

I've created Azure support ticket TrackingID#2104060010003005 for this issue.

ulisesh · 2021-04-07T00:49:17Z

Created a second ticket for vs2019 queue ID#2104070010000075

jakubstilec · 2021-04-07T10:04:59Z

The issue is still active, no response from Azure support. I chased 2104070010000075.

jakubstilec · 2021-04-07T14:30:37Z

Because there is no update I also create ICM ticket https://portal.microsofticm.com/imp/v3/incidents/details/235578196/home

Should help mitigate dotnet#50746

* Switch to VS preview pool for public builds Should help mitigate #50746 * Run init-vs-env.cmd for Browser wasm Windows build The BuildPool.Windows.10.Amd64.VS2019.Pre.Open queue doesn't have ninja installed outside of VS so it's only available in PATH if you run the init-vs-env.cmd script.

Reverts dotnet#50993, the underlying Azure issue was fixed. Closes dotnet#50746

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 5, 2021

jkoritzinsky added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' area-Infrastructure labels Apr 5, 2021

runfoapp bot mentioned this issue Apr 5, 2021

Infrastructure - Status/Health #702

Closed

ViktorHofer added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly and removed untriaged New issue has not been triaged by the area owner labels Apr 7, 2021

ViktorHofer added this to the 6.0.0 milestone Apr 7, 2021

michellemcdaniel mentioned this issue Apr 8, 2021

some small proxy-related fixes #50770

Merged

jkoritzinsky mentioned this issue Apr 8, 2021

Revert "[release/5.0] When marshalling a layout class, fall-back to dynamically marshalling the type if it doesn't match the static type in the signature." #50883

Merged

akoeplinger added a commit to akoeplinger/runtime that referenced this issue Apr 9, 2021

Switch to VS preview pool for public builds

0a7cb91

Should help mitigate dotnet#50746

akoeplinger mentioned this issue Apr 9, 2021

Switch to VS preview pool for public builds #50993

Merged

akoeplinger added a commit to akoeplinger/runtime that referenced this issue Apr 12, 2021

Revert "Switch to VS preview pool for public builds (dotnet#50993)"

2365f56

Reverts dotnet#50993, the underlying Azure issue was fixed. Closes dotnet#50746

akoeplinger mentioned this issue Apr 12, 2021

Revert "Switch to VS preview pool for public builds (#50993)" #51103

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 12, 2021

akoeplinger closed this as completed in #51103 Apr 12, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 12, 2021

runfoapp bot removed this from the 6.0.0 milestone Apr 14, 2021

ghost locked as resolved and limited conversation to collaborators May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime jobs being abandoned due to infra #50746

Runtime jobs being abandoned due to infra #50746

runfoapp bot commented Apr 5, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Apr 5, 2021

ghost commented Apr 5, 2021

jkoritzinsky commented Apr 5, 2021

safern commented Apr 5, 2021

michellemcdaniel commented Apr 5, 2021

michellemcdaniel commented Apr 5, 2021

safern commented Apr 6, 2021

jakubstilec commented Apr 6, 2021

lukas-lansky commented Apr 6, 2021 •

edited

Loading

michellemcdaniel commented Apr 6, 2021

trylek commented Apr 6, 2021

ilyas1974 commented Apr 6, 2021

trylek commented Apr 6, 2021

ulisesh commented Apr 6, 2021

ilyas1974 commented Apr 6, 2021

ulisesh commented Apr 7, 2021

jakubstilec commented Apr 7, 2021

jakubstilec commented Apr 7, 2021

Runtime jobs being abandoned due to infra #50746

Runtime jobs being abandoned due to infra #50746

Comments

runfoapp bot commented Apr 5, 2021 • edited Loading

dotnet-issue-labeler bot commented Apr 5, 2021

ghost commented Apr 5, 2021

jkoritzinsky commented Apr 5, 2021

safern commented Apr 5, 2021

michellemcdaniel commented Apr 5, 2021

michellemcdaniel commented Apr 5, 2021

safern commented Apr 6, 2021

jakubstilec commented Apr 6, 2021

lukas-lansky commented Apr 6, 2021 • edited Loading

michellemcdaniel commented Apr 6, 2021

trylek commented Apr 6, 2021

ilyas1974 commented Apr 6, 2021

trylek commented Apr 6, 2021

ulisesh commented Apr 6, 2021

ilyas1974 commented Apr 6, 2021

ulisesh commented Apr 7, 2021

jakubstilec commented Apr 7, 2021

jakubstilec commented Apr 7, 2021

runfoapp bot commented Apr 5, 2021 •

edited

Loading

lukas-lansky commented Apr 6, 2021 •

edited

Loading