-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream directly to disk #175
base: master
Are you sure you want to change the base?
Conversation
There are conflicts. Nevertheless this looks great, yet I'm not an expert with nodejs and it seems a bit scary for me, all those async stuff and so on 😄. I would like someone else to look at those changes and review them, as I don't feel so confident here and don't want to break people's existing workflows if something goes wrong. |
7ebd1ca
to
df393ed
Compare
df393ed
to
fa2f5f1
Compare
The current implementation saves everything to memory and extracts the zip files to memory, before copying to the filesystem. This can consume a huge amount of memory if artifacts are large. This change streams the zip file directly to disk and extracts it without loading the entire zip in memory first.
eba744b
to
a22d1b0
Compare
Hi @dawidd6 , Rebased on latest main branch. I am no Javascript expert either. Had to go through a bunch of docs to write this PR. It seems to work, and from my understanding of how the async/await and promises work, that bit of the code should be correct. However, I would like for someone that actually writes Javascript more often than me to review this as well :). We ended up using my fork for now so there is no urgency, but would love to not have to maintain it separately. Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a bit odd, I don't have a lot of context on the business logic involved here, but I did the review from a JS perspective which is what was required. My only concern here is error handling when making the GET request.
if (!fs.existsSync(path)) { | ||
fs.mkdirSync(path, { recursive: true }) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid using sync
here, I would use a try catch using the fs.promises
library like this:
try {
await fs.promises.mkdir(path);
} catch(e) {
// if error is already existing path, do nothing, throw it if it's something else.
}
Not a big deal though, the code was already doing this before
file.on("error", () => { | ||
core.info(`error saving file: ${err}`); | ||
resolve() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why aren't we rejecting the promise here?
} | ||
|
||
core.startGroup(`==> Extracting: ${artifact.name}.zip`) | ||
yauzl.open(saveTo, {lazyEntries: true}, function(err, zipfile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we following any prettier / eslint rules here? The indentation and spacing seems odd here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is I wasn't. I learned just enough Javascript to write this PR 😄.
Thanks for the review! I had completely forgotten about this PR. I will have a closer look as soon as I can. The goal is to stream the files to disk instead of saving them to memory, then extract them directly to disk in chunks. We wanted to use this in a project that required the download of large (multiple GB) files. This lead to the github runner running out of memory and the OOM killer would kick in and start killing processes. The jobs would get canceled. It was a bit of a head scratcher, because it wasn't immediately obvious why the jobs were failing. The only message in the action was "Canceled" (or something similar). For small artifacts it's not a problem. When downloading/unarchiving virtual machine disk images, it becomes an issue. |
Co-authored-by: Joan Gil <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extraction logic LGTM!
The current implementation saves everything to memory and extracts the zip files to memory, before copying to the filesystem. This can consume a huge amount of memory if artifacts are large. This change streams the zip file directly to disk and extracts it without loading the entire zip in memory first.