Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: improve flaky http2-stream-destroy-event-order #31590

Closed

Conversation

lundibundi
Copy link
Member

This doesn't properly fix the issue, it only significantly reduces
(removes?) the flakiness of the test.

The setTimeout must not be needed here. Specifically if the settings
frame on the client is 'delayed' and only comes after we have already
called req.close(2) then the 'close' event will be emitted before the
stream has got a chance to write the rst to the underlying socket (even
though FlushRstStream and SendPendingData have been called) and since we
are destroying a session here it will never be written resulting in a
'clean' stream close on the server side and therefore a timeout because
there will be no error.


Looks like we are not correctly waiting for the rst frame to be sent and I assume the time it takes to read the settings frame delays the write just enough to not get written at all. I will try to investigate this further but at least this should improve our CI so that it won't fail with this test almost every time.
This patch reduces the flakiness from 1 failed in 10 runs to 0/2000 on my windows machine.

The debug native logs can be found:
http2-destroy-event-order-success.txt
http2-destroy-event-order-fail.txt.

/cc @nodejs/testing @jasnell @addaleax

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

This doesn't properly fix the issue, it only significantly reduces
(removes?) the flakiness of the test.

The setTimeout must not be needed here. Specifically if the settings
frame on the client is 'delayed' and only comes after we have already
called `req.close(2)` then the 'close' event will be emitted before the
stream has got a chance to write the rst to the underlying socket (even
though FlushRstStream and SendPendingData have been called) and since we
are destroying a session here it will never be written resulting in a
'clean' stream close on the server side and therefore a timeout because
there will be no error.
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Jan 31, 2020
@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Jan 31, 2020

@nodejs-github-bot
Copy link
Collaborator

@lundibundi
Copy link
Member Author

lundibundi commented Jan 31, 2020

😆 got hit by 3 other flaky tests parallel/test-stream-pipeline-http2, async-hooks/test-statwatcher, test-inspector-connect-main-thread. Unfortunately, none of them failed for me on Windows at even extreme -j and --repeat numbers =(.
Resuming the CI.

Edit: stress test looks ok but some of the machines failed to build, should I just restart?

@richardlau
Copy link
Member

Edit: stress test looks ok but some of the machines failed to build, should I just restart?

I'm not sure what the expected state of compiling master for Windows on ARM is, but for the other failures I wouldn't expect master to work on VS2015 as VS2017 is the minimum supported version.

@addaleax addaleax added the http2 Issues or PRs related to the http2 subsystem. label Feb 1, 2020
addaleax added a commit to addaleax/node that referenced this pull request Feb 1, 2020
Alternative to nodejs#31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: nodejs#31590
Refs: nodejs#20750
@addaleax
Copy link
Member

addaleax commented Feb 1, 2020

I think #31610 fully fixes this, can you take a look?

If my understanding of the situation is correct, then yes, this PR improves the situation by giving the server one more event loop turn to receive the relevant stream data, but I think explicitly waiting for it to arrive is also fine and more reliable.

@lundibundi lundibundi closed this Feb 2, 2020
Trott pushed a commit to Trott/io.js that referenced this pull request Feb 6, 2020
Alternative to nodejs#31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: nodejs#31590
Refs: nodejs#20750

PR-URL: nodejs#31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this pull request Feb 17, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this pull request Mar 15, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this pull request Mar 17, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
codebytere pushed a commit that referenced this pull request Mar 30, 2020
Alternative to #31590.

It appears that the issue here is that the test falsely assumed that
closing the client (which also currently destroys the socket rather
than gracefully shutting down the connection) would still leave
enough time for the server side to receive the stream error.

Address that by explicitly waiting for the server side to receive the
stream error before closing the client and the connection with it.

Refs: #31590
Refs: #20750

PR-URL: #31610
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Denys Otrishko <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants