Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPM install stucks with node:20 #1946

Open
riethmue opened this issue Aug 7, 2023 · 47 comments
Open

NPM install stucks with node:20 #1946

riethmue opened this issue Aug 7, 2023 · 47 comments

Comments

@riethmue
Copy link

riethmue commented Aug 7, 2023

Environment

  • Platform: MacOS Ventura (13.4.1) on M1
  • Docker Version: 24.0.5
  • Node.js Version: 20
  • Image Tag: 20

Expected Behavior

NPM install should be started. I've tested this with node:18 image instead and it works. node:20-slim stucks too.

Current Behavior

NPM install does not start and stucks.

[test] docker build --progress=plain --platform linux/amd64 -t test-image .     9:52:01 
#0 building with "desktop-linux" instance using docker driver

#1 [internal] load .dockerignore
#1 transferring context: 66B done
#1 DONE 0.0s

#2 [internal] load build definition from Dockerfile
#2 transferring dockerfile: 414B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/node:20
#3 DONE 1.3s

#4 [1/5] FROM docker.io/library/node:20@sha256:64b71834718b859ea389790ae56e5f2f8fa9456bf3821ff75fa28a87a09cbc09
#4 CACHED

#5 [2/5] WORKDIR /usr/src/app
#5 DONE 0.0s

#6 [internal] load build context
#6 transferring context: 1.12kB 0.0s done
#6 DONE 0.0s

#7 [3/5] COPY package*.json ./
#7 DONE 0.0s

#8 [4/5] RUN npm install

I've tried other npm commands like npm cache clean --force but those stuck too.

Possible Solution

Steps to Reproduce

I'm using the docker file from official nodejs example :

FROM node:20

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available (npm@5+)
COPY package*.json ./

RUN npm install
# If you are building your code for production
# RUN npm ci --omit=dev

# Bundle app source
COPY . .

EXPOSE 8080
CMD [ "node", "server.js" ]

Additional Information

@dannykruitbosch
Copy link

I have the same issue with yarn

@matteovivona
Copy link

matteovivona commented Aug 14, 2023

Same here. My image works with Node 16, 18 and 19. With 20 even corepack enable command gets stuck

Update: on my pipeline based on Ubuntu-22.04 linux/amd64 works

@LaurentGoderre
Copy link
Member

I just tried on Ventura with an M2 chip and it didn't hang

@LaurentGoderre
Copy link
Member

I am ablwe to reproduce the behavior when disabling my network. I am wondering if some firewall rule might be blocking traffic.

@callmemaxi
Copy link

Hi, I'm also facing the same issue. I'm using M1 Mac. I'm observing this issue since last week. I have tried Node 20,18 alpine, slim and latest. The issue persist.

@callmemaxi
Copy link

I was able to run yarn and npm commands now after disabling Use Rosetta for x86/amd64 emulation on Apple Silicon option in docker settings. Not sure this is a fix.

@eric-zeng
Copy link

I'm encountering the same issue - docker build hangs on RUN npm install. The problem occurs when trying to build an image using --platform=linux/amd64 on Docker Desktop Mac.

(Why do this? I want to test an image on my mac laptop that will eventually run in an amd64 environment, and one apt dependency does not have an arm64 version)

Workarounds I found:

  • Downgrade to the previous base node image (node:18.17.0, should be the case for node:20 as well)
  • Turn off Rosetta, as @callmemaxi suggested (slow)
  • Build for arm64 (causes issues if you need amd64)

Dockerfile

FROM --platform=linux/amd64 node:18
RUN npm install --loglevel verbose
CMD index.js

package.json

{
  "name": "docker-bug",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "type": "module",
  "license": "ISC",
  "dependencies": {
    "chalk": "^5.3.0"
  }
}

index.js

import chalk from 'chalk';
console.log(chalk.green('Hello world'));

Console output

$ docker build . -t node-docker-bug
[+] Building 11.7s (4/5)                                                                                                                                  docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                                           0.0s
 => [internal] load build definition from Dockerfile                                                                                                                      0.0s
 => => transferring dockerfile: 120B                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/node:18                                                                                                                0.3s
 => CACHED [1/2] FROM docker.io/library/node:18@sha256:ee0a21d64211d92d4340b225c556e9ef1a8bce1d5b03b49f5f07bf1dbbaa5626                                                   0.0s
 => [2/2] RUN npm install --loglevel verbose                                                                                                                             11.5s
 => => # npm info using [email protected]                                                                                                                                             
 => => # npm info using [email protected]                                                                                                                                         
 => => # npm verb title npm install                                                                                                                                           
 => => # npm verb argv "install" "--loglevel" "verbose"                                                                                                                       
 => => # npm verb logfile logs-max:10 dir:/root/.npm/_logs/2023-09-25T19_40_55_092Z-                                                                                          
 => => # npm verb logfile /root/.npm/_logs/2023-09-25T19_40_55_092Z-debug-0.log  

@will3942
Copy link

will3942 commented Oct 2, 2023

Same issue affecting npm and yarn when using docker buildx and building for linux/arm/v7

@ozbillwang
Copy link

ozbillwang commented Nov 15, 2023

got same issues , you can easily duplicate this issue with below commands

$ cat Dockerfile

FROM node:lts-alpine

RUN export PATH=/usr/local/bin:$PATH && npm install -g semver

CMD ["semver", "--help"]

$ export platform="linux/arm/v7"

$ docker buildx build --progress=plain --platform "${platform}" --tag demo .

...
#6 [2/2] RUN export PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && npm install -g semver

Then it stucks at installation forever.

run tests on below platforms

linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64/v8,linux/ppc64le,linux/s390x
  • platforms without problem: linux/amd64, linux/arm64/v8, linux/ppc64le
  • platforms with buildx hangs on problem: linux/s390x, inux/arm/v6, linux/arm/v7

@pheiduck
Copy link

pheiduck commented Dec 3, 2023

My workaround at the moment is to stay on Node 18.x.
I hope there will be a fix for the armv6 and armv7 architectures. Otherwise I will have to drop them.
Maybe Node 22.x

@n0rt0nthec4t
Copy link

Also have the same issue, and reverted back to Node 18.x for my docker builds.

@agners
Copy link

agners commented Jan 3, 2024

It seems that this is a duplicate of #1798? I see the same seqence of mremap as reported in #1798 (comment) when using strace with the reproducer from #1946 (comment).

@omaraboumrad
Copy link

Upgrading from Docker Desktop 4.21 to 4.26.1 (Latest at the time of this writing) seems to have fixed this problem for me on an M2 Mac.

Ravinou added a commit to Ravinou/borgwarehouse that referenced this issue Feb 4, 2024
# linux/arm/v7 arm32 is not supported by node20 nodejs/docker-node#1946
@brianbraunstein
Copy link

Trigger

What triggered this problem for me was updating my package-lock.json from version 1 to version 3.

Solution

The key solution for me was using npm install --no-audit, which also just makes it run much faster, and auditing can probably just be done independently of the normal build/CI process.

Details

Here are some additional details I gathered before finding the final solution, in case it's helpful for others, or for debugging the root problem.

Sorta Solutions

These attempts somewhat worked but not in a satisfactory way...

  • Reverting back to my version 1 package-lock.json file worked, but I didn't want to stick with this because it felt fragile and it was slow because npm had to recreate it each time I ran my docker-based build (unless in the docker cache).

  • For 1 code repository, it worked to remove the package-lock.json file and recreate it, but this didn't work for my other repo.

Non-Solutions

These attempts didn't work...

  • Flushing docker state with ...
docker container prune -a
docker rm -f $(docker ps -qa)
docker image prune -a
docker volume prune
docker volume rm $(docker volume ls -q)
docker buildx prune
apt-cache madison docker-ce
va=... # pick an older version from above
apt-cache madison containerd.io
vb=... # pick an older version from above

apt-get install docker-ce=$va containerd.io=$vb
# restarted
  • Installing the latest LTS node currently 20.11.1 which contains npm version 10.2.4

  • Deleting and recreating my package-lock.json (although it worked for the 1 repo mentioned above)

  • Using docker build --network=host

@stepanroznik
Copy link

stepanroznik commented Mar 27, 2024

Using docker build --network=host did actually help me. I'm on Ubuntu 22.04 through WSL2 on Windows 11, Node v20.10.0

@Armerila
Copy link

My finding is that it eventually goes through the installation process, but hangs for 2 - 5 minutes. In my build process there are two node applications installed in node:20, the other one goes through fine, the second one hangs.

Both have lockfileVersion 3. Only difference I can find is that the non-functioning one is defined as a "type": "module" in package.json.

valentindeaconu added a commit to terralist/terralist that referenced this issue Apr 19, 2024
valentindeaconu added a commit to terralist/terralist that referenced this issue Apr 19, 2024
* chore: Upgrade stack versions

* Revert node upgrade
See nodejs/docker-node#1946
cooperaj added a commit to cooperaj/kingtrashmouth that referenced this issue Apr 30, 2024
@pheiduck
Copy link

pheiduck commented May 4, 2024

It's still a thing on node 22 :/

@Tofandel
Copy link

Tofandel commented Oct 28, 2024

FIY the issue is in npm and node (not qemu), which after 10.4.0 runs a node debugging method once per resolved package. This debugging method takes very long after http requests are made in node, see npm/cli#4028 (comment)
For only 10 packages it's not an issue, but as soon as you go over 500 packages (which is 99% of the ecosystem) it takes very very long amounts of time to complete (days)

The hardest part of locating this issue is now done and a PR is opened so hopefully a fix will be released soon

@TahaJumaah
Copy link

On windows with WSL2 installed, changing the builder in docker setting to "default" and not using WSL builder seems to help with the issue. Not sure if it's an actual solution.

@dadatuputi
Copy link

I was able to get node:lts to get past this hang by adding --network=host. Using node:20 got past the hang without --network=host - the difference according to nodejs.org is:

Node.js Version npm
v22.11.0 v10.9.0
v20.18.0 v10.8.2

I hope you found the fix @Tofandel, but why would adding --network=host work, or why would v20 with npm version 10.8.2 work when you said the issue was introduced in 10.4.0? It's possible I'm facing a different issue myself.

@Tofandel
Copy link

Tofandel commented Oct 31, 2024

Did you try with npm 10.8.2 and node 22.11.0 instead of switching both versions at the same time? Or did you try with node 22.11.0 with npm 10.3.0 ?

Basically the root of the issue is in node's getReport which runs reverse DNS queries in a sync manner for all open handles (think fetch)

Npm had the good idea to introduce glibc detection using node's getReport.. In a loop... Resulting in extremely long delays in some envs where DNS resolution is not cached or not fast

The network host bypasses the dns resolution of the docker image, which apparently is slow (this is the only thing I didn't figure out, why some are super fast and some so slow, for example windows takes 5s for a uv_get_nameinfo but wsl only 2s)

It's possible the node version simply introduces variance in which handles are kept open after a request and thus aleviates the issue

@diego-coba
Copy link

I found out with NodeJS 22.7 this error doesn't occur.

I faced the issue using node:22-alpine as the base image for my NextJS app. As they don't keep old versions of it (They overwrite the node:22-alpine tag to have the most recent minor version of Node) using node:22-alpine currently uses NodeJS 22.11 which does have the bug.

I had to use their Dockerfile for Alpine 3.20 and NodeJS 22.7
5313e6a

Then, used that image built locally as the base for my build process and It worked like a charm. Went down from 28 min to 1:50 min build time.

@Tofandel
Copy link

Tofandel commented Nov 4, 2024

Downgrading node is a bad idea (especially 22.7 which is known to have a big buffer bug), it works because doing that also changes the npm version, simply use the latest node but downgrade npm with npm i -g [email protected] which is the latest npm version that doesn't use getReport, for people on which it works on 10.8.2, add "libc": "glibc" to your package.json, which will bypass the call to getReport in reifyNode for optional deps (but not arborist)

Basically what I found is that the bug will either occur in 10.4.0 or 10.8.3 depending on your actual dependencies because checkPlatform is called in 2 different places (but it is the same bug)

@diego-coba
Copy link

diego-coba commented Nov 4, 2024

@Tofandel Good call. I'll give it a try. Thanks

Edit: It worked. Now I'm using Node 22.11 with NPM 10.3.0
Still getting 1 min 50 secs build time

BTW I'll keep my custom alpine build to make sure I keep on 22.11 and prevent unintended upgrades to potentially unstable minor versions.

Thanks again!

@mve
Copy link

mve commented Nov 6, 2024

if at all possible i recommend switching to using bun for your install step. Even if you need to use node it can save a lot if time in the install step to use bun. I had the same issue with installs taking forever.

Switching to bun made the install time go from 57 seconds to 2 seconds.
https://bun.sh/

@kdg-cwood
Copy link

Using a Dockerfile to specify the image seems to fix it.

@xeon826 Can you elaborate?

@diego-coba
Copy link

diego-coba commented Nov 6, 2024

Using a Dockerfile to specify the image seems to fix it.

@xeon826 Can you elaborate?

The issue comes from NPM >=10.4.0
You can just install NPM 10.3.0 on whatever NodeJS version you're using
Alternatively you can specify a node image built for a minor version where the issue was not introduced yet (I know it doesn't happen with 22.7 but 22.7 has a major bug with streams not handling the Encoding correctly so, I'd keep 22.11 as it's the LTS minor version).

FROM node:22.11.0-alpine3.20 AS base

# Install a chosen NPM version
ENV NPM_VERSION=10.3.0
RUN npm install -g npm@"${NPM_VERSION}"

# Install dependencies only when needed
FROM base AS deps

# The rest of your Dockerfile

@kdg-cwood
Copy link

Thanks @diego-coba ! I will give it a try.

I wanted to add that I only experience this issue when npm install is run as a step in the script that my Dockerfile has configured as the CMD. If I leave it out, and then manually run docker exec ... npm install on the running container, it works fine.

@romoguill
Copy link

Downgrading npm version as suggested to @10.3.0 worked for me. Thanks

@jonseymour
Copy link

jonseymour commented Nov 11, 2024

I can confirm that downgrading my node image from node:lts-slim to node:22.9-slim fixed the issue for me. At the time of writing node:lts-slim was resolving to node:22.11-slim. I did not try node:22.10-slim.

The issue was not occurring on OSX (orbstack+arm64) image builds but was happening for me on linux/amd64 builds running on a linux/amd64 system.

@kdg-cwood
Copy link

kdg-cwood commented Nov 11, 2024

I also downgraded npm to 10.3.0 and that seems to have fixed the problem.

@bram2202
Copy link

I had the same issue building for ARM7 and ARM64.
Every build it hang on one specific packaged being installed.
Got it to work by adding the --force flag to the npm install command.

jens-maus added a commit to jens-maus/RaspberryMatic that referenced this issue Nov 15, 2024
@ghnp5
Copy link

ghnp5 commented Nov 16, 2024

Same here! Downgrading to npm v10.3.0 does fix the issue.

@wraithgar
Copy link

This was fixed in npm 10.9.1 (via npm/npm-install-checks#120)

@fuhlich
Copy link

fuhlich commented Nov 25, 2024

I had just had to cancel a cross plattform docker build for armv7 after 4 hours. I used alpine:edge with npm 10.9.1 as builder and npm install stuck. So may be there is still a qemu problem?

@Tofandel
Copy link

You would need to debug this further adding timings and debug log to your task, there is still a few things that can make npm freeze

One is a kernel issue amazonlinux/amazon-linux-2023#840 (comment) (you can try the workaround and see if that helps)

Another possibility is one of your dependency is building something and this is what's getting stuck

You will need to find the cause or do some debugging before we can be of further help

@fuhlich
Copy link

fuhlich commented Nov 27, 2024

I have set up a package.json with no packages. Using alpine:edge with npm 10.9.1 with target x64, the docker build finishes within seconds. With target armv7, the npm install makes no progress even after 1 hour.

With alpine:3.18.9, the npm install only takes a few seconds to complete with armv7 target. The first failing version is alpine:19.1.0 which uses npm 10.

The mentioned workaround has no effect.
Unfortunately, I have no idea, how to debug npm within an alpine container during a cross compilation build with docker.

@rendmath
Copy link

rendmath commented Nov 28, 2024

Exact same symptoms here with a node:22-slim hanging on npm ci during a build on Windows 11.

Any of these 3 independent workarounds seem to work for me (tested multiple times):

  • Downgrading to node:22.9-slim or inferior.
  • Running npm cache clean --force before the npm ci
  • Running npm install -g npm@10 before the npm ci

@OxCom
Copy link

OxCom commented Dec 2, 2024

For me the fix was:

RUN npm --maxsockets=1 i

but that is temp solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests