Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-http2-large-file #22327

Closed
jasnell opened this issue Aug 14, 2018 · 37 comments · Fixed by #37156
Closed

Investigate flaky test-http2-large-file #22327

jasnell opened this issue Aug 14, 2018 · 37 comments · Fixed by #37156
Assignees
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. http2 Issues or PRs related to the http2 subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.

Comments

@jasnell
Copy link
Member

jasnell commented Aug 14, 2018

The brand new just added test-http2-large-file appears to be flaky on Windows.

16:52:01 not ok 574 sequential/test-http2-large-file
16:52:01   ---
16:52:01   duration_ms: 2.908
16:52:01   severity: fail
16:52:01   exitcode: 1
16:52:01   stack: |-
16:52:01     (node:6524) ExperimentalWarning: The http2 module is an experimental API.
16:52:01     events.js:167
16:52:01           throw er; // Unhandled 'error' event
16:52:01           ^
16:52:01     
16:52:01     Error: read ECONNRESET
16:52:01         at TCP.onread (net.js:661:25)
16:52:01     Emitted 'error' event at:
16:52:01         at emitErrorNT (internal/streams/destroy.js:82:8)
16:52:01         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
16:52:01         at process._tickCallback (internal/process/next_tick.js:63:19)
16:52:01   ...
@jasnell jasnell added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Aug 14, 2018
@maclover7 maclover7 added windows Issues and PRs related to the Windows platform. test Issues and PRs related to the tests. http2 Issues or PRs related to the http2 subsystem. labels Aug 14, 2018
@joyeecheung
Copy link
Member

I can only find one instance of this in the last 100 CI runs

Reason     sequential/test-http2-large-file
Type       JS_TEST_FAILURE
Failed PR  1 (#22269)
Appeared   test-rackspace-win2008r2-x64-3
Last CI    https://ci.nodejs.org/job/node-test-pull-request/16450/

@joyeecheung
Copy link
Member

By the way I think I've seen this on my local Windows machine before..

@jasnell
Copy link
Member Author

jasnell commented Aug 14, 2018

Yep, it's flaky but only extremely rarely.

@refack
Copy link
Contributor

refack commented Aug 17, 2018

Flaky:

1000   OK: 947   NOT OK: 53   TOTAL: 1000

BridgeAR added a commit to BridgeAR/node that referenced this issue Aug 26, 2018
This makes sure the first request is not sent before the client is
ready to respond. Therefore the ECONNRESET errors should be resolved.

Fixes: nodejs#22327
@Trott
Copy link
Member

Trott commented Aug 29, 2018

https://ci.nodejs.org/job/node-test-binary-windows/19625/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=2/console

03:45:21 not ok 578 sequential/test-http2-large-file
03:45:21   ---
03:45:21   duration_ms: 2.109
03:45:21   severity: fail
03:45:21   exitcode: 1
03:45:21   stack: |-
03:45:21     events.js:167
03:45:21           throw er; // Unhandled 'error' event
03:45:21           ^
03:45:21     
03:45:21     Error: read ECONNRESET
03:45:21         at TCP.onread (net.js:661:25)
03:45:21     Emitted 'error' event at:
03:45:21         at emitErrorNT (internal/streams/destroy.js:82:8)
03:45:21         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
03:45:21         at process._tickCallback (internal/process/next_tick.js:63:19)
03:45:21   ...

@Trott
Copy link
Member

Trott commented Aug 29, 2018

Is it only happening on win2008r2? Because provider can make a difference, the one above is rackspace.

@refack
Copy link
Contributor

refack commented Aug 29, 2018

Is it only happening on win2008r2? Because provider can make a difference, the one above is rackspace.

I want to say that I run the stress on an Azure machine, but I'm not sure.

@Trott
Copy link
Member

Trott commented Sep 10, 2018

Again on win2008r2 and a rackspace host (test-rackspace-win2008r2-x64-5).

https://ci.nodejs.org/job/node-test-binary-windows/19897/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=0/console

01:33:41 not ok 580 sequential/test-http2-large-file
01:33:41   ---
01:33:41   duration_ms: 2.73
01:33:41   severity: fail
01:33:41   exitcode: 1
01:33:41   stack: |-
01:33:41     events.js:167
01:33:41           throw er; // Unhandled 'error' event
01:33:41           ^
01:33:41     
01:33:41     Error: read ECONNRESET
01:33:41         at TCP.onStreamRead (internal/stream_base_commons.js:112:27)
01:33:41     Emitted 'error' event at:
01:33:41         at emitErrorNT (internal/streams/destroy.js:82:8)
01:33:41         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
01:33:41         at process._tickCallback (internal/process/next_tick.js:63:19)
01:33:41   ...

@Trott
Copy link
Member

Trott commented Sep 15, 2018

Once again on win2008r2, and again it was rackspace (test-rackspace-win2008r2-x64-6).

https://ci.nodejs.org/job/node-test-binary-windows/19974/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=2/console

21:20:53 not ok 581 sequential/test-http2-large-file
21:20:53   ---
21:20:53   duration_ms: 1.796
21:20:53   severity: fail
21:20:53   exitcode: 1
21:20:53   stack: |-
21:20:53     events.js:167
21:20:53           throw er; // Unhandled 'error' event
21:20:53           ^
21:20:53     
21:20:53     Error: read ECONNRESET
21:20:53         at TCP.onStreamRead (internal/stream_base_commons.js:112:27)
21:20:53     Emitted 'error' event at:
21:20:53         at emitErrorNT (internal/streams/destroy.js:82:8)
21:20:53         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
21:20:53         at process._tickCallback (internal/process/next_tick.js:63:19)
21:20:53   ...

@Trott
Copy link
Member

Trott commented Sep 15, 2018

@nodejs/platform-windows

@Trott
Copy link
Member

Trott commented Sep 15, 2018

I could be wrong, but the frequency with which this one occurs seems to be increasing...

Once again, it's on win2008r2--seems to be the only variant affected, at least on CI--and once again it's on one of the Rackspace hosts (test-rackspace-win2008r2-x64-4). I've never seen this on one of the Azure hosts. (I know there are differences in terms of memory and/or CPU, and probably other things. /ping @nodejs/build )

https://ci.nodejs.org/job/node-test-binary-windows/19980/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=2/console

00:00:12 not ok 581 sequential/test-http2-large-file
00:00:12   ---
00:00:12   duration_ms: 1.698
00:00:12   severity: fail
00:00:12   exitcode: 1
00:00:12   stack: |-
00:00:12     events.js:167
00:00:12           throw er; // Unhandled 'error' event
00:00:12           ^
00:00:12     
00:00:12     Error: read ECONNRESET
00:00:12         at TCP.onStreamRead (internal/stream_base_commons.js:112:27)
00:00:12     Emitted 'error' event at:
00:00:12         at emitErrorNT (internal/streams/destroy.js:82:8)
00:00:12         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
00:00:12         at process._tickCallback (internal/process/next_tick.js:63:19)
00:00:12   ...

@Trott
Copy link
Member

Trott commented Sep 15, 2018

Bug in http2? @nodejs/http2

@Trott
Copy link
Member

Trott commented Sep 17, 2018

Continues to seem to only manifest on win2008r2 Rackspace hosts. This one is test-rackspace-win2008r2-x64-6.

https://ci.nodejs.org/job/node-test-binary-windows/20005/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=3/console

16:05:09 not ok 581 sequential/test-http2-large-file
16:05:09   ---
16:05:09   duration_ms: 2.96
16:05:09   severity: fail
16:05:09   exitcode: 1
16:05:09   stack: |-
16:05:09     events.js:167
16:05:09           throw er; // Unhandled 'error' event
16:05:09           ^
16:05:09     
16:05:09     Error: read ECONNRESET
16:05:09         at TCP.onStreamRead (internal/stream_base_commons.js:112:27)
16:05:09     Emitted 'error' event at:
16:05:09         at emitErrorNT (internal/streams/destroy.js:82:8)
16:05:09         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
16:05:09         at process._tickCallback (internal/process/next_tick.js:63:19)
16:05:09   ...

refack pushed a commit to JaneaSystems/node that referenced this issue Sep 19, 2018
Refs: nodejs#20750
Refs: nodejs#22327
Refs: nodejs#22762
Refs: nodejs/reliability#16

PR-URL: nodejs#22941
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
targos pushed a commit that referenced this issue Sep 20, 2018
Refs: #20750
Refs: #22327
Refs: #22762
Refs: nodejs/reliability#16

PR-URL: #22941
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Refael Ackermann <[email protected]>
@Trott Trott pinned this issue Feb 14, 2019
@Trott
Copy link
Member

Trott commented Feb 15, 2019

Again today. This seems to be increasing in frequency, but also still only occurring on Rackspace hosts running Win2008r2 with VS 2017.

https://ci.nodejs.org/job/node-test-binary-windows/23765/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=0/console

test-rackspace-win2008r2-x64-5

00:22:37 not ok 612 sequential/test-http2-large-file # TODO : Fix flaky test
00:22:37   ---
00:22:37   duration_ms: 3.107
00:22:37   severity: flaky
00:22:37   exitcode: 1
00:22:37   stack: |-
00:22:37     events.js:173
00:22:37           throw er; // Unhandled 'error' event
00:22:37           ^
00:22:37     
00:22:37     Error: read ECONNRESET
00:22:37         at TCP.onStreamRead (internal/stream_base_commons.js:173:27)
00:22:37     Emitted 'error' event at:
00:22:37         at emitErrorNT (internal/streams/destroy.js:82:8)
00:22:37         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
00:22:37         at processTicksAndRejections (internal/process/next_tick.js:76:17)
00:22:37   ...

@Trott
Copy link
Member

Trott commented Feb 27, 2019

https://ci.nodejs.org/job/node-test-binary-windows/24008/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=1/console

test-rackspace-win2008r2-x64-3

00:34:30 not ok 615 sequential/test-http2-large-file # TODO : Fix flaky test
00:34:30   ---
00:34:30   duration_ms: 2.606
00:34:30   severity: flaky
00:34:30   exitcode: 1
00:34:30   stack: |-
00:34:30     events.js:173
00:34:30           throw er; // Unhandled 'error' event
00:34:30           ^
00:34:30     
00:34:30     Error: read ECONNRESET
00:34:30         at TCP.onStreamRead (internal/stream_base_commons.js:173:27)
00:34:30     Emitted 'error' event at:
00:34:30         at emitErrorNT (internal/streams/destroy.js:82:8)
00:34:30         at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
00:34:30         at processTicksAndRejections (internal/process/next_tick.js:76:17)
00:34:30   ...

@fhinkel fhinkel unpinned this issue Mar 18, 2019
@Trott
Copy link
Member

Trott commented Mar 31, 2019

https://ci.nodejs.org/job/node-test-binary-windows/24984/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=1/console

test-rackspace-win2008r2-x64-3

00:25:39 not ok 626 sequential/test-http2-large-file # TODO : Fix flaky test
00:25:39   ---
00:25:39   duration_ms: 1.794
00:25:39   severity: flaky
00:25:39   exitcode: 1
00:25:39   stack: |-
00:25:39     events.js:171
00:25:39           throw er; // Unhandled 'error' event
00:25:39           ^
00:25:39     
00:25:39     Error: read ECONNRESET
00:25:39         at TCP.onStreamRead (internal/stream_base_commons.js:182:27)
00:25:39     Emitted 'error' event at:
00:25:39         at emitErrorNT (internal/streams/destroy.js:91:8)
00:25:39         at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
00:25:39         at processTicksAndRejections (internal/process/task_queues.js:81:17)
00:25:39   ...

@mmarchini
Copy link
Contributor

https://ci.nodejs.org/job/node-test-binary-windows-2/494/COMPILED_BY=vs2017,RUNNER=win2008r2-vs2017,RUN_SUBSET=0/console

test-rackspace-win2008r2-x64-3

11:54:36 not ok 1209 sequential/test-http2-large-file # TODO : Fix flaky test
11:54:36   ---
11:54:36   duration_ms: 2.193
11:54:36   severity: flaky
11:54:36   exitcode: 1
11:54:36   stack: |-
11:54:36     events.js:173
11:54:36           throw er; // Unhandled 'error' event
11:54:36           ^
11:54:36     
11:54:36     Error: read ECONNRESET
11:54:36         at TCP.onStreamRead (internal/stream_base_commons.js:183:27)
11:54:36     Emitted 'error' event at:
11:54:36         at emitErrorNT (internal/streams/destroy.js:91:8)
11:54:36         at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
11:54:36         at processTicksAndRejections (internal/process/task_queues.js:84:17)
11:54:36   ...

@Trott
Copy link
Member

Trott commented Jul 25, 2019

test-http2-large-file fails on Windows almost every CI run. I don't suppose there's someone with a bit of time to dig in? #22327 (comment) is the most thorough troubleshooting that's happened so far, as far as I know.

@nodejs/platform-windows @nodejs/http2 @nodejs/testing

@Trott
Copy link
Member

Trott commented Sep 10, 2019

@nodejs/collaborators Is anyone digging into this or interested in digging into it? It seems not-impossible that whatever is causing this may be causing other test unreliability on Windows. Best detailed description seems to be #22327 (comment).

@Trott
Copy link
Member

Trott commented Dec 10, 2019

It's been a few months, so I'm popping in to say that this one still fails all the time.

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/277/RUN_SUBSET=3,nodes=win2008r2-COMPILED_BY-vs2017/testReport/junit/(root)/test/sequential_test_http2_large_file_/

events.js:280
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
Emitted 'error' event on ServerHttp2Stream instance at:
    at emitErrorCloseNT (internal/streams/destroy.js:75:8)
    at processTicksAndRejections (internal/process/task_queues.js:84:21) {
  errno: -4077,
  code: 'ECONNRESET',
  syscall: 'read'
}

Trott added a commit to Trott/io.js that referenced this issue Dec 11, 2019
Wait for drain event when necessary.

Fixes: nodejs#22327
@Trott
Copy link
Member

Trott commented Dec 12, 2019

I required help from both @joyeecheung and @jasnell to get it right, but here's the output of a failing instance of this test with NODE_DEBUG set to http2.

19:33:52 not ok 1 sequential/test-http2-large-file
19:33:52   ---
19:33:52   duration_ms: 2.153
19:33:52   severity: fail
19:33:52   exitcode: 1
19:33:52   stack: |-
19:33:52     HTTP2 4420: Http2Session client: created
19:33:52     HTTP2 4420: Http2Session client: initiating request
19:33:52     (node:4420) Warning: Setting the NODE_DEBUG environment variable to 'http2' can expose sensitive data (such as passwords, tokens and authentication headers) in the resulting log.
19:33:52     HTTP2 4420: Http2Session client: setting up session handle
19:33:52     HTTP2 4420: Http2Session client: sending settings
19:33:52     HTTP2 4420: Http2Session client: submitting settings
19:33:52     HTTP2 4420: Http2Session client: connected, initializing request
19:33:52     HTTP2 4420: Http2Session server: received a connection
19:33:52     HTTP2 4420: Http2Session server: setting up session handle
19:33:52     HTTP2 4420: Http2Session server: sending settings
19:33:52     HTTP2 4420: Http2Session server: submitting settings
19:33:52     HTTP2 4420: Http2Session server: created
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session server]: headers received
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session server]: undefined
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session server]: undefined
19:33:52     HTTP2 4420: Http2Session client: settings received
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session client]: headers received
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session client]: emitting stream 'response' event
19:33:52     HTTP2 4420: Http2Session server: settings received
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session client]: undefined
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session client]: 0
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session client]: destroying stream
19:33:52     HTTP2 4420: Http2Session client: marking session closed
19:33:52     HTTP2 4420: Http2Session client: submitting goaway
19:33:52     HTTP2 4420: Http2Session client: destroying
19:33:52     HTTP2 4420: Http2Session <invalid>: socket error [read ECONNRESET]
19:33:52     HTTP2 4420: Http2Session server: destroying
19:33:52     HTTP2 4420: Http2Stream 1 [Http2Session server]: destroying stream
19:33:52     events.js:281
19:33:52           throw er; // Unhandled 'error' event
19:33:52           ^
19:33:52     
19:33:52     Error: read ECONNRESET
19:33:52         at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
19:33:52     Emitted 'error' event on ServerHttp2Stream instance at:
19:33:52         at emitErrorCloseNT (internal/streams/destroy.js:75:8)
19:33:52         at processTicksAndRejections (internal/process/task_queues.js:84:21) {
19:33:52       errno: -4077,
19:33:52       code: 'ECONNRESET',
19:33:52       syscall: 'read'
19:33:52     }
19:33:52   ...

@jasnell jasnell closed this as completed Jun 17, 2020
@Trott
Copy link
Member

Trott commented Jun 19, 2020

This is failing frequently in CI. No one reports it because it's marked as flaky. But it's still a broken test (or indicative of something broken in core). Please exercise caution when closing flaky test issues.

@Trott Trott reopened this Jun 19, 2020
@Trott
Copy link
Member

Trott commented Jun 19, 2020

(Although don't misunderstand me: Thanks for all the triaging today, closing stale issues! More please! And yeah, that means you're going to close one once in a while that needs to be re-opened. And also, I realize now you were closing an issue you yourself opened, so doubly understandable.)

@Trott
Copy link
Member

Trott commented Jan 31, 2021

This seems to have stopped happening, possibly because it only ever happened on win2008 and that went eol last year.

Trott added a commit to Trott/io.js that referenced this issue Jan 31, 2021
This seems to have stopped happening, possibly because it only ever
happened on win2008 and that went eol last year.

Closes: nodejs#22327
@Trott Trott closed this as completed in 4f44cad Feb 3, 2021
danielleadams pushed a commit that referenced this issue Feb 16, 2021
This seems to have stopped happening, possibly because it only ever
happened on win2008 and that went eol last year.

Closes: #22327

PR-URL: #37156
Fixes: #22327
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this issue May 1, 2021
This seems to have stopped happening, possibly because it only ever
happened on win2008 and that went eol last year.

Closes: #22327

PR-URL: #37156
Fixes: #22327
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Stephen Belanger <[email protected]>
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. http2 Issues or PRs related to the http2 subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.
Projects
None yet
7 participants