Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api-client): default consortium provider timeout test flake #656

Closed
petermetz opened this issue Mar 10, 2021 · 1 comment · Fixed by #713
Closed

fix(api-client): default consortium provider timeout test flake #656

petermetz opened this issue Mar 10, 2021 · 1 comment · Fixed by #713
Assignees
Labels
bug Something isn't working

Comments

@petermetz
Copy link
Contributor

Describe the bug

The mentioned test case is flaky because we depend on an external service to simulate certain HTTP status codes for different failure modes.

We need to migrate over to something like this1 to simulate HTTP failure scenarios in-house without having to go beyond the local network interface so that the CI will not depend on external services that can experience downtime (well, we still get this with DockerHub sometimes, but it is what it is...)

To Reproduce

It's a flaky test so it will happen randomly if you try hard enough and for long enough.

Expected behavior

No flaky tests, ever.

Logs/Stack traces

N/A

Screenshots

N/A

Cloud provider or hardware configuration:

Always happens on the Github Actions CI runner.

Operating system name, version, build:

Ubuntu LTS

Hyperledger Cactus release version or commit (git rev-parse --short HEAD):

main

Hyperledger Cactus Plugins/Connectors Used

N/A

Additional context

This has been worked on before but it seems the earlier fix has failed to deliver on its promises so we have to get back at it again.

cc: @takeutak @sfuji822 @hartm @jonathan-m-hamilton @AzaharaC @jordigiam @kikoncuo

@petermetz petermetz added the bug Something isn't working label Mar 10, 2021
@petermetz petermetz self-assigned this Mar 10, 2021
@petermetz
Copy link
Contributor Author

FYI @kikoncuo @jagpreetsinghsasan @AzaharaC @jordigiam I believe this will be fixed once the other PRs get merged that resolve the "disk full" issue. For now my working theory that this flake was also caused by that issue, so hopefully all the failing CI checks will turn green soon.

petermetz added a commit to petermetz/cacti that referenced this issue Mar 23, 2021
petermetz added a commit to petermetz/cacti that referenced this issue Mar 23, 2021
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ledger-cacti#656

Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ger-cacti#656

This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ledger-cacti#656

Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ger-cacti#656

This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ledger-cacti#656

Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 24, 2021
…ger-cacti#656

This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit that referenced this issue Mar 25, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes #656

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit that referenced this issue Mar 25, 2021
Potentially fixing #656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit that referenced this issue Mar 25, 2021
Potentially fixing #656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit that referenced this issue Mar 25, 2021
Potentially fixing #656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit that referenced this issue Mar 25, 2021
This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 25, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz added a commit to petermetz/cacti that referenced this issue Mar 26, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <[email protected]>
jordigiam pushed a commit to kikoncuo/cactus that referenced this issue Apr 8, 2021
Bumping up the test timesouts to a full hour
becauase under heavy load the GHA runner seems
to be extremely slow, meaning that the fabric
tests can take longer than half an hour each
despite the fact that these usually take about
5 minutes or less even on the slow GHA runners.

Fixes hyperledger-cacti#656

Signed-off-by: Peter Somogyvari <[email protected]>
jordigiam pushed a commit to kikoncuo/cactus that referenced this issue Apr 8, 2021
…ledger-cacti#656

Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

This change makes it so that the pullImage(...) method of the Containers
utility class will now - by default - retry 6 times if the docker
image pulling has failed. The internval between retries is
increasing exponentially (power of two) starting from one
second as the delay then proceeding to be 2^6 seconds
for the final retry (which if also fails then an AbortError
is thrown by the underlying pRetry library that is powering
the retry mechanism.)

For reference, here is a randomly failed CI test execution
where the logs show that DockerHub is randomly in-
accessible over the network and that's another thing that
makes our tests flaky, hence this commit to fix this.

https://github.com/hyperledger/cactus/runs/2178802580?check_suite_focus=true#step:8:2448

In case that link goes dead in the future, here's also the actual logs:

not ok 60 - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts # time=25389.665ms
  ---
  env:
    TS_NODE_COMPILER_OPTIONS: '{"jsx":"react"}'
  file: packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  timeout: 1800000
  command: /opt/hostedtoolcache/node/12.13.0/x64/bin/node
  args:
    - -r
    - /home/runner/work/cactus/cactus/node_modules/ts-node/register/index.js
    - --max-old-space-size=4096
    - packages/cactus-test-cmd-api-server/src/test/typescript/integration/remote-plugin-imports.test.ts
  stdio:
    - 0
    - pipe
    - 2
  cwd: /home/runner/work/cactus/cactus
  exitCode: 1
  ...
{
    # NodeJS API server + Rust plugin work together
    [2021-03-23T20:45:51.458Z] INFO (VaultTestServer): Created VaultTestServer OK. Image FQN: vault:1.6.1
    not ok 1 Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
      ---
        operator: error
        at: bound (/home/runner/work/cactus/cactus/node_modules/onetime/index.js:30:12)
        stack: |-
          Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
              at /home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:301:17
              at IncomingMessage.<anonymous> (/home/runner/work/cactus/cactus/packages/cactus-test-tooling/node_modules/docker-modem/lib/modem.js:328:9)
              at IncomingMessage.emit (events.js:215:7)
              at endReadableNT (_stream_readable.js:1183:12)
              at processTicksAndRejections (internal/process/task_queues.js:80:21)
      ...

    Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
}
Bail out! Error: (HTTP code 500) server error - Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Signed-off-by: Peter Somogyvari <[email protected]>
jordigiam pushed a commit to kikoncuo/cactus that referenced this issue Apr 8, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious error in the CI
that can be seen at the bottom.

Based off of the advice of a fellow internet user as seen here:
https://stackoverflow.com/a/61789467

No idea if this will fix the particular error that we
are trying to fix or not, but we have to try. The
underlying issue seems to be a bug in npm itself,
but knowing that doesn't disappear the need to
find a workaround so here we go...

Error logs and link:
----------------------------

Link: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs:

Run npm ci
  npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
jordigiam pushed a commit to kikoncuo/cactus that referenced this issue Apr 8, 2021
Potentially fixing hyperledger-cacti#656. Definitely improves the situation but it
is impossible to tell in advance if this will make all the other-
wise non-reproducible issues go away. Fingers crossed.

An attempt to fix the mysterious issue with npm ci

Based on a true story:
https://stackoverflow.com/a/15483897

CI failure logs: https://github.com/hyperledger/cactus/runs/2179881505?check_suite_focus=true#step:5:8

Logs
------

 npm ci
  shell: /usr/bin/bash -e {0}
  env:
    JAVA_HOME_8.0.275_x64: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME: /opt/hostedtoolcache/jdk/8.0.275/x64
    JAVA_HOME_8_0_275_X64: /opt/hostedtoolcache/jdk/8.0.275/x64
npm ERR! cb() never called!

npm ERR! This is an error with npm itself. Please report this error at:
npm ERR!     <https://npm.community>

Signed-off-by: Peter Somogyvari <[email protected]>
jordigiam pushed a commit to kikoncuo/cactus that referenced this issue Apr 8, 2021
…ger-cacti#656

This is yet another attempt at potentially fixing all the remaining CI
flakes that only happen on the GitHub Action runners but never on
developer machines.

Signed-off-by: Peter Somogyvari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant