Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inability to unzip assets during build on Unix x64 #32805

Closed
jaredpar opened this issue Feb 25, 2020 · 25 comments · Fixed by #49321
Closed

Inability to unzip assets during build on Unix x64 #32805

jaredpar opened this issue Feb 25, 2020 · 25 comments · Fixed by #49321
Assignees
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Milestone

Comments

@jaredpar
Copy link
Member

jaredpar commented Feb 25, 2020

Seeing about ~3% of our builds failing with the following error:

Extraction failed for file: /Users/runner/runners/2.165.0/work/1/s/download/libraries_test_assets_OSX_x64_Debug/libraries_test_assets_OSX_x64_Debug.tar.gz

Spot checking the failures these seem to be limited to OSX builds.

Runfo Tracking Issue: Runtime unable to unzip assets

Definition Build Kind Job Name
runtime 1295938 PR 57098 Libraries Test Run release coreclr windows x86 Debug
runtime 1292772 PR 57366 Libraries Test Run checked coreclr OSX x64 Debug
runtime 1270933 PR 56710 Libraries Test Run release coreclr windows x86 Debug
runtime 1256415 PR 54912 Installer Build and Test coreclr windows_arm64 Debug
runtime 1254286 PR 56122 CoreCLR Pri0 Runtime Tests Run windows x64 checked
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux arm Release
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl arm Release
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl x64 Debug
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl arm64 Release
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux x64 Debug
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux x64 Debug
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl x64 Debug
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl arm64 Release
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux arm Release
runtime 1245989 PR 55924 Libraries Test Run checked coreclr Linux_musl arm Release

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
1 2 6

Other tracking issue: https://mseng.visualstudio.com/AzureDevOps/_workitems/edit/1673333

@jaredpar jaredpar added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' area-Infrastructure labels Feb 25, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 25, 2020
@jaredpar
Copy link
Member Author

jaredpar commented Mar 2, 2020

Issues still occurring, impacting non-Mac builds now too

Build Kind Timeline Record
543112 PR #33048 Unzip Test Assets
542475 PR #33035 Unzip Test Assets
542475 PR #33035 Unzip Test Assets
540894 PR #32979 Unzip Test Assets
540894 PR #32979 Unzip Test Assets
540894 PR #32979 Unzip Test Assets
539759 PR #32948 Unzip Test Assets

@ViktorHofer
Copy link
Member

Just happened again here: https://dev.azure.com/dnceng/public/_build/results?buildId=547852. I believe we started a mail thread about this. cc @safern?

@ViktorHofer ViktorHofer changed the title Inability to unzip assets during build on OSX 64 Inability to unzip assets during build on Unix x64 Mar 5, 2020
@jaredpar
Copy link
Member Author

jaredpar commented Mar 5, 2020

Failures since March 3rd (since last update)

Build Kind Timeline Record
548115 PR #33251 Unzip Test Assets
548115 PR #33251 Unzip Test Assets
547852 PR #33238 Unzip Test Assets
547775 Rolling Unzip Test Assets
547775 Rolling Unzip Test Assets
547514 PR #33223 Unzip Test Assets
545323 PR #341 Unzip Test Assets

Evaluated 215 builds
Impacted 5 bulids
Impacted 7 jobs

@MattGal
Copy link
Member

MattGal commented Mar 9, 2020

Just happened again here: https://dev.azure.com/dnceng/public/_build/results?buildId=547852. I believe we started a mail thread about this. cc @safern?

We're tracking this via https://github.com/dotnet/core-eng/issues/9100.
The best thread to follow for this I believe is the IcM ticket which is currently assigned to the artifact team: https://portal.microsofticm.com/imp/v3/incidents/details/177158735/home

If this is hitting often enough, I am willing to bump its priority and sit on a Sev2 IcM bridge; I'm not sure the % hit rate will convince them of this level of importance yet though.

@jaredpar
Copy link
Member Author

Closing as the zip disable has fixed this.

@ViktorHofer
Copy link
Member

@MattGal
Copy link
Member

MattGal commented Jul 23, 2020

@ViktorHofer this isn't an inability to unzip assets, it's an inability to download them. From your logs:

Downloading artifact libraries_test_assets_Windows_NT_x64_Debug from: https://dev.azure.com/dnceng//_apis/resources/Containers/4666396?itemPath=libraries_test_assets_Windows_NT_x64_Debug&isShallow=true&api-version=4.1-preview.4
Downloading libraries_test_assets_Windows_NT_x64_Debug/libraries_test_assets_Windows_NT_x64_Debug.zip to F:\workspace\_work\1\s\__download__\libraries_test_assets_Windows_NT_x64_Debug\libraries_test_assets_Windows_NT_x64_Debug.zip
Downloaded libraries_test_assets_Windows_NT_x64_Debug/libraries_test_assets_Windows_NT_x64_Debug.zip to F:\workspace\_work\1\s\__download__\libraries_test_assets_Windows_NT_x64_Debug\libraries_test_assets_Windows_NT_x64_Debug.zip
Total Files: 1, Processed: 1, Skipped: 0, Failed: 0, Download time: 1805.634 secs, Download size: 78.441MB

... but if you download that zip, you'll find it's > 380 MB, not 78

Relevant issue: microsoft/azure-pipelines-tasks#13250

@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Jul 28, 2020
@ericstj ericstj added this to the 5.0.0 milestone Jul 28, 2020
@ericstj
Copy link
Member

ericstj commented Jul 28, 2020

@safern previously added a workaround to our pipelines for this. Are we missing that workaround in some places we call DownloadBuildArtifacts?

@safern
Copy link
Member

safern commented Jul 28, 2020

I don't think so. This should be disabled by setting a variable on the yml file and we set that variable in a place where all pipelines use:

# Workaround for azure devops flakiness when dowloading artifacts
# https://github.com/dotnet/runtime/issues/32805
- name: System.DisableZipDownload
value: true

@safern
Copy link
Member

safern commented Jul 28, 2020

@ViktorHofer this isn't an inability to unzip assets, it's an inability to download them. From your logs:

Yeah, but this was the issue tracking this failure and the workaround I provided was to prevent that from happening.

@ViktorHofer
Copy link
Member

I struggle finding the action item for this issue besides continue ref-counting when the issue happens again. Hence I'm inclined moving this to Future.

@ViktorHofer ViktorHofer modified the milestones: 5.0.0, Future Aug 4, 2020
@jaredpar
Copy link
Member Author

This is still occuring

https://runfo.azurewebsites.net/search/timelines/?bq=repository%3Aruntime+started%3A%7E7&tq=Extraction+failed+for+file

@ViktorHofer
Copy link
Member

We're tracking this via dotnet/core-eng#9100.

@MattGal the linked issue is closed. From what I understand we implemented a workaround but this started to happen again. How should we proceed here?

@MattGal
Copy link
Member

MattGal commented Oct 1, 2020

We're tracking this via dotnet/core-eng#9100.

@MattGal the linked issue is closed. From what I understand we implemented a workaround but this started to happen again. How should we proceed here?

The task we're discussing isn't one we own, and the linked IcM (https://portal.microsofticm.com/imp/v3/incidents/details/177158735/home) was archived without being mitigated back in May.

I pinged my contact on that team but aside from recreating the same IcM again and being told again they haven't figured out how to fix it, your workaround is probably going to be doing the downloading of artifacts yourself directly via AzDO API calls. I'll update once I hear back.

@ViktorHofer
Copy link
Member

Just pinged @MattGal about this offline. Even if we might be able to mitigate this by collapsing our legs and depending less on the affected task, we will probably not be able to completely get rid of this in the near future. We might want to create a new IcM for the issue.

@MattGal
Copy link
Member

MattGal commented Dec 1, 2020

At @markwilkie 's suggestion I'll be tracking my efforts to reduce this problem via the linked core-eng issue.

@jaredpar
Copy link
Member Author

jaredpar commented Dec 1, 2020

This is a case where we can safely retry. Pretty much any infra level issue which is identifiable and occurs before tests run is safe to retry. We don't have to wait for core-eng to provide the necessarily logging infra for tests.

@MattGal
Copy link
Member

MattGal commented Dec 1, 2020

This is a case where we can safely retry. Pretty much any infra level issue which is identifiable and occurs before tests run is safe to retry. We don't have to wait for core-eng to provide the necessarily logging infra for tests.

You can safely retry the leg, but the problem is the task "succeeds" when this problem occurs and runfo can't differentiate this from things like an actual malformed archive. It would definitely be preferable (and unfortunately fell through the cracks since July) for the task to actually fail on failure and use its built-in retry mechanisms.

@jaredpar
Copy link
Member Author

jaredpar commented Dec 1, 2020

runfo can't differentiate this from things like an actual malformed archive.

Correct. At the same time we have zero actual malformed archives. So the rate at which this is a false positive is presently zero 😄

Even in the case where we do have a false positive (actual malformed archive) the risk is low here. We retry the job, it fails again and a developer will have to investigate.

@MattGal
Copy link
Member

MattGal commented Dec 1, 2020

@jaredpar makes sense; I'll keep pushing on the actual task getting fixed too.

@safern
Copy link
Member

safern commented Jan 5, 2021

@MattGal we just hit this in a rolling build on OSX but I see the dnceng issue is closed. Do we have a way to track AzDo adding retry for Unix code paths?

https://dev.azure.com/dnceng/public/_build/results?buildId=938377&view=logs&j=d8aa34eb-4280-5100-d989-99a78ab22b6c&t=2fa7448e-f849-5fe0-7bed-68068872d0e8&l=18

@MattGal
Copy link
Member

MattGal commented Jan 5, 2021

I have been tracking this since "the surge" via https://github.com/dotnet/core-eng/issues/11551, and AFAIK we're just waiting for this PR to merge; feel free to ping it some more too: microsoft/azure-pipelines-tasks#14065 . I suspect things are just slower than usual because folks are still getting back from holidays.

@safern
Copy link
Member

safern commented Jan 5, 2021

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@jaredpar @lukas-lansky @ViktorHofer @MattGal @ericstj @safern @Dotnet-GitSync-Bot and others