Retry when downloading/extraction of Node.js tar fails #784

trivikr · 2023-09-20T17:45:54Z

Is your feature request related to a problem? Please describe.

Sometimes the downloading of Node.js artifact fails because of some network issue.

In our release CI, we noticed it happening twice on Sep 20th

First occurrence

       fetch : https://nodejs.org/dist/v14.21.3/node-v14.21.3-linux-x64.tar.xz
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
xz: (stdin): Unexpected end of input
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
   Error: failed to download archive for 14.21.3

The second occurrence on retry was similar, but for node v18.18.0

       fetch : https://nodejs.org/dist/v18.18.0/node-v18.18.0-linux-x64.tar.xz

Describe the solution you'd like

When fetching the tar of Node.js version fails, add a retry with some delay?

Describe alternatives you've considered

for i in {1..5}; do n 18 && break || sleep 15; done

Additional context

This issue was discussed in the past in #287, but closed as network error didn't appear again for the user

trivikr · 2023-09-20T19:34:39Z

The source code which attempts to download archive

n/bin/n

Lines 799 to 804 in 02a8fe0

    
           log fetch "$url" 
        
           do_get "${url}" | tar "$tarflag" --strip-components=1 --no-same-owner -f - 
        
           pipe_results=( "${PIPESTATUS[@]}" ) 
        
           if [[ "${pipe_results[0]}" -ne 0 ]]; then 
        
             abort "failed to download archive for $version" 
        
           fi

shadowspawn · 2023-09-20T20:52:59Z

Related to network problems, there have been some timeout suggestions: #544 #771

trivikr · 2023-09-20T21:07:38Z

Similar request on nvm nvm-sh/nvm#3074

shadowspawn · 2023-09-24T01:00:59Z

I currently don't think a retry is appropriate in the n code. There are lots of things that can go wrong, including systematic problems that need fixing before an attempt can work. I prefer failing fast and leaving it up to the caller to decide how to proceed.

Happy to leave the issue open for a month to see if there are other comments.

lstkz · 2023-09-26T10:52:17Z

We are using AWS CodeBuild, and we see this error often. I think ~ 5% of builds fail because of it.

[Container] 2023/09/26 06:35:38 Running command n 18.14.2
--
20 | installing : node-v18.14.2
21 | mkdir : /usr/local/n/versions/node/18.14.2
22 | fetch : https://nodejs.org/dist/v18.14.2/node-v18.14.2-linux-x64.tar.xz
23 | curl: (18) transfer closed with 9476756 bytes remaining to read
24 | xz: (stdin): Unexpected end of input
25 | tar: Unexpected EOF in archive
26 | tar: Unexpected EOF in archive
27 | tar: Error is not recoverable: exiting now
28 |  
29 | Error: failed to download archive for 18.14.2

Maybe there is something wrong with nodejs cdn, and there are more stable mirrors?
Adding a retry logic (with an extra flag) would be useful.

trivikr · 2023-09-26T16:17:35Z

I think ~ 5% of builds fail because of it.

We also noticed the same, when using n on AWS CodeBuild.

As of now, we've added retries as follows as a workaround:

for i in {1..5}; do n 18 && break || sleep 15; done

It would be helpful if n retries downloading/extraction of Node.js in case of failure.
The caller is likely going to retry installation in these cases, as the issue is with download/extraction.

There are lots of things that can go wrong, including systematic problems that need fixing before an attempt can work

How about adding retries for specific failures based on reports?
The reports here mention curl errors 18 and 92.

shadowspawn · 2023-09-26T20:05:29Z

Interesting that there is a common factor of AWS CodeBuild in first two reports. And n is preinstalled (1 2), nice.

I see that you can specify the node version as part of the AWS CodeBuild runtime setup, if that fits into your workflow.

https://docs.aws.amazon.com/codebuild/latest/userguide/available-runtimes.html

darkalor · 2023-09-27T12:45:12Z

I've started seeing this exact error on our CI jobs over the last couple of weeks. We've been using the same deployment for years and I've never seen that before. Curious if something changed on node's CDN 🤔

A quick google search says this might be related to conflicting HTTP 1/2 settings on the server https://stackoverflow.com/a/68591734
It doesn't seem like there's currently a way to pass a setting like that through n. Perhaps that could be added? Not opposed to the idea of a retry setting either.

shadowspawn · 2023-09-28T09:36:53Z

Ah, a known problem due to CDN cache being purged multiple times per day:

elibarzilay · 2024-08-03T04:18:07Z

Looking at the above two rejected PRs (#544 #771), it seems like the right choice, since implementing a retry is not a good idea.

But curl has a builtin --retry argument that works nicely, and even does an exponential backoff. That eliminates any complexities in actual code (just adding something like --retry 5 to $CURL_OPTIONS does the trick).

Two points to support this:

wget has an unfair advantage in that it defaults to retrying. Would make sense to make the curl use more similar.
Using a config file is an option, but there's already precedent in the code for adding curl arguments (--insecure), which could have been rejected in favor of your own file.

BTW, another problem with a global ~/.curlrc config file is that people are usually unaware of it, and even do some silly shell looping as above (and hopefully it's clear why that naive loop is a bad idea).

Oh, and regardless of node's web changes, we have intermittent issues from time to time due to build hosts running inside a vpn that can be unreliable sometimes. So this problem exists regardless.

shadowspawn · 2024-08-03T06:54:45Z

I hadn't noticed curl retry before. Interesting.

Using --retry would require a refactor to save the download to a file, as can't restart/retry data going to a pipe.

The --retry covers transient errors. Not sure whether the failures in this issue are transient as such.

  --retry <num>
         If a transient error is returned when curl tries to perform a transfer, it retries this number of times
         before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error
         means either: a timeout, an FTP 4xx response code or an HTTP 408, 429, 500, 502, 503 or 504 response code.

elibarzilay · 2024-08-03T07:57:09Z

Re errors: I don't remember how often, but I definitely saw transient errors. There are options like --retry-connrefused and even --retry-all-errors -- but on average cases I'd definitely not want that. (And using them can get the whole thing to be equivalent to the simple bash retrying loop...)
Re refactoring the code to have a file: I don't think that it's important. Even if it fetches the file from the start, the point is not to save the few bits and resume with a ranged request, but just to make it work.
- In interactive use there won't be any difference, since if it fails I'd obviously run it again anyway.
- And in a CI context, refetching the whole thing is still orders of magnitude cheaper than a failing job, which would be restarted anyway (and therefore refetch the whole thing anyway).

But since it was interesting to see if they do that, I searched a bit. Didn't find any conclusive answers, but:

An output file is not necessarily needed: I found this PR which (IIUC) resumes after the byte count that already poured out. It was abandoned, but they have a todo item for it. That's just an interesting comment.
Unless it does the same, wget would benefit from the same kind of refactoring to use a file. (Searching for wget is more difficult since its users tend to be less technical, so there's too many random pages that come up...)

So, keeping in mind my opinion above, using a file might be good anyway. And doing that shouldn't be too hard:

  local tmp="/some/tmp/file.tgz"
  rm -f "$tmp"
  { do_get_file "${url}" "$tmp" && tar "$tarflag" --strip-components=1 --no-same-owner -f "$tmp"; } \
  && { rm -f "$tmp"; abort "blah blah"; }

And this might simplify more code since stdout is now used as usual. (Handling output is usually what makes me switch to using a temp file with curl...)

shadowspawn · 2024-11-02T20:10:04Z

The Node.js caching improvements might be fully functional now. Fingers crossed!

Promotion to R2's dist-prod bucket never happens nodejs/build#3838

shadowspawn · 2024-11-07T20:41:30Z

Cloudflare caching for /dist enabled just now: nodejs/build#3461 (comment)

shadowspawn · 2024-11-26T05:44:48Z

Hopefully this is much improved by new caching setup for Node.js

Feel free to open a new issue if it comes up again, with new information and renewed interest.

trivikr changed the title ~~Retry when installing Node.js tar fails.~~ Retry when downloading of Node.js tar fails Sep 20, 2023

trivikr changed the title ~~Retry when downloading of Node.js tar fails~~ Retry when downloading/extraction of Node.js tar fails Sep 20, 2023

shadowspawn mentioned this issue Sep 28, 2023

Regular errors downloading Node.js tarballs nodejs/help#4254

Closed

simonrw mentioned this issue Oct 27, 2023

Install nodejs using http1.1 to work around download issues localstack/localstack#9496

Closed

trivikr mentioned this issue Mar 22, 2024

Cloudflare R2 Workers nodejs/build#3461

Open

shadowspawn mentioned this issue Jul 24, 2024

Add retry-on-failed download mklement0/n-install#57

Closed

shadowspawn closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry when downloading/extraction of Node.js tar fails #784

Retry when downloading/extraction of Node.js tar fails #784

trivikr commented Sep 20, 2023 •

edited

Loading

trivikr commented Sep 20, 2023

shadowspawn commented Sep 20, 2023

trivikr commented Sep 20, 2023

shadowspawn commented Sep 24, 2023

lstkz commented Sep 26, 2023

trivikr commented Sep 26, 2023 •

edited

Loading

shadowspawn commented Sep 26, 2023

darkalor commented Sep 27, 2023

shadowspawn commented Sep 28, 2023

elibarzilay commented Aug 3, 2024

shadowspawn commented Aug 3, 2024

elibarzilay commented Aug 3, 2024

shadowspawn commented Nov 2, 2024

shadowspawn commented Nov 7, 2024 •

edited

Loading

shadowspawn commented Nov 26, 2024

Retry when downloading/extraction of Node.js tar fails #784

Retry when downloading/extraction of Node.js tar fails #784

Comments

trivikr commented Sep 20, 2023 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

trivikr commented Sep 20, 2023

shadowspawn commented Sep 20, 2023

trivikr commented Sep 20, 2023

shadowspawn commented Sep 24, 2023

lstkz commented Sep 26, 2023

trivikr commented Sep 26, 2023 • edited Loading

shadowspawn commented Sep 26, 2023

darkalor commented Sep 27, 2023

shadowspawn commented Sep 28, 2023

elibarzilay commented Aug 3, 2024

shadowspawn commented Aug 3, 2024

elibarzilay commented Aug 3, 2024

shadowspawn commented Nov 2, 2024

shadowspawn commented Nov 7, 2024 • edited Loading

shadowspawn commented Nov 26, 2024

trivikr commented Sep 20, 2023 •

edited

Loading

trivikr commented Sep 26, 2023 •

edited

Loading

shadowspawn commented Nov 7, 2024 •

edited

Loading