Stream directly to disk #175

gabriel-samfira · 2022-07-11T08:36:08Z

The current implementation saves everything to memory and extracts the zip files to memory, before copying to the filesystem. This can consume a huge amount of memory if artifacts are large. This change streams the zip file directly to disk and extracts it without loading the entire zip in memory first.

dawidd6 · 2022-08-30T09:56:53Z

There are conflicts. Nevertheless this looks great, yet I'm not an expert with nodejs and it seems a bit scary for me, all those async stuff and so on 😄. I would like someone else to look at those changes and review them, as I don't feel so confident here and don't want to break people's existing workflows if something goes wrong.

The current implementation saves everything to memory and extracts the zip files to memory, before copying to the filesystem. This can consume a huge amount of memory if artifacts are large. This change streams the zip file directly to disk and extracts it without loading the entire zip in memory first.

gabriel-samfira · 2022-08-31T11:47:47Z

Hi @dawidd6 ,

Rebased on latest main branch.

I am no Javascript expert either. Had to go through a bunch of docs to write this PR. It seems to work, and from my understanding of how the async/await and promises work, that bit of the code should be correct. However, I would like for someone that actually writes Javascript more often than me to review this as well :).

We ended up using my fork for now so there is no urgency, but would love to not have to maintain it separately.

Cheers,
Gabriel

joandvgv

This PR is a bit odd, I don't have a lot of context on the business logic involved here, but I did the review from a JS perspective which is what was required. My only concern here is error handling when making the GET request.

joandvgv · 2023-01-26T17:20:30Z

main.js

+            if (!fs.existsSync(path)) {
+                fs.mkdirSync(path, { recursive: true })
+            }


To avoid using sync here, I would use a try catch using the fs.promises library like this:

try { await fs.promises.mkdir(path); } catch(e) { // if error is already existing path, do nothing, throw it if it's something else. }

Not a big deal though, the code was already doing this before

main.js

joandvgv · 2023-01-26T17:23:04Z

main.js

+                        file.on("error", () => {
+                            core.info(`error saving file: ${err}`);
+                            resolve()
+                        })


I'm curious why aren't we rejecting the promise here?

joandvgv · 2023-01-26T17:27:38Z

main.js

+            }
+
+            core.startGroup(`==> Extracting: ${artifact.name}.zip`)
+            yauzl.open(saveTo, {lazyEntries: true}, function(err, zipfile) {


Are we following any prettier / eslint rules here? The indentation and spacing seems odd here

My guess is I wasn't. I learned just enough Javascript to write this PR 😄.

gabriel-samfira · 2023-01-26T17:40:28Z

This PR is a bit odd, I don't have a lot of context on the business logic involved here, but I did the review from a JS perspective which is what was required. My only concern here is error handling when making the GET request.

Thanks for the review! I had completely forgotten about this PR. I will have a closer look as soon as I can.

The goal is to stream the files to disk instead of saving them to memory, then extract them directly to disk in chunks.

We wanted to use this in a project that required the download of large (multiple GB) files. This lead to the github runner running out of memory and the OOM killer would kick in and start killing processes. The jobs would get canceled. It was a bit of a head scratcher, because it wasn't immediately obvious why the jobs were failing. The only message in the action was "Canceled" (or something similar). For small artifacts it's not a problem. When downloading/unarchiving virtual machine disk images, it becomes an issue.

Co-authored-by: Joan Gil <[email protected]>

EnricoMi

The extraction logic LGTM!

gabriel-samfira mentioned this pull request Jul 11, 2022

Make the kola test workflow reusable flatcar/scripts#374

Merged

dawidd6 added the help wanted Extra attention is needed label Aug 30, 2022

gabriel-samfira force-pushed the stream-to-disk branch from 7ebd1ca to df393ed Compare August 30, 2022 18:30

gabriel-samfira closed this Aug 30, 2022

gabriel-samfira force-pushed the stream-to-disk branch from df393ed to fa2f5f1 Compare August 30, 2022 18:37

gabriel-samfira reopened this Aug 30, 2022

gabriel-samfira force-pushed the stream-to-disk branch from eba744b to a22d1b0 Compare August 30, 2022 18:42

joandvgv reviewed Jan 26, 2023

View reviewed changes

Update main.js

849bc1f

Co-authored-by: Joan Gil <[email protected]>

dawidd6 mentioned this pull request Feb 28, 2023

2Gb Limit on Artifacts #201

Open

EnricoMi approved these changes Apr 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream directly to disk #175

Stream directly to disk #175

gabriel-samfira commented Jul 11, 2022

dawidd6 commented Aug 30, 2022

gabriel-samfira commented Aug 31, 2022

joandvgv left a comment

joandvgv Jan 26, 2023

joandvgv Jan 26, 2023

joandvgv Jan 26, 2023

gabriel-samfira Jan 26, 2023

gabriel-samfira commented Jan 26, 2023

EnricoMi left a comment

Stream directly to disk #175

Are you sure you want to change the base?

Stream directly to disk #175

Conversation

gabriel-samfira commented Jul 11, 2022

dawidd6 commented Aug 30, 2022

gabriel-samfira commented Aug 31, 2022

joandvgv left a comment

Choose a reason for hiding this comment

joandvgv Jan 26, 2023

Choose a reason for hiding this comment

joandvgv Jan 26, 2023

Choose a reason for hiding this comment

joandvgv Jan 26, 2023

Choose a reason for hiding this comment

gabriel-samfira Jan 26, 2023

Choose a reason for hiding this comment

gabriel-samfira commented Jan 26, 2023

EnricoMi left a comment

Choose a reason for hiding this comment