-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for concurrent cache downloads with timeout #1484
Conversation
@mattiaerre @yacaovsnc This would likely help alleviate a significant amount of failures related to that known Azure Blob issue we discussed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hey @chkimes, this is a great patch! We recently forked the actions cache to try and get it working against our minio instance and initially had hand-rolled our own download logic, but then decided what you have here is much more robust. We have seen one instance where a cache restore hangs after making some progress using the method in this patch but have not been able to reproduce it ever since. Considering each segment has a timeout of 30s we're kinda perplexed how a hang is even possible. Is there any chance you can share how you tested this patch? We're unsure what the best way to setup a testing hook for this method is and wanted to see if you had some ideas. |
This PR introduces a new default option for download from the Actions Cache. The new option uses the Actions HTTP Client with concurrent downloads from Blob storage, similar to how the Azure SDK works, except this includes much more granular control over timing out downloads during the initial response and while consuming the full response stream. This should once and for all address the issues seen in actions/cache#810.
I have performed extensive testing on the new downloader and comparing to the existing downloader. Here's a week's worth of 99th percentile data showing that the old method is significantly more impacted by Azure Blob issues:
Due to a few more connection caching optimizations I made, we can also see by putting percentiles on the x-axis that total execution time is always lower in similar test scenarios: