-
Notifications
You must be signed in to change notification settings - Fork 7.3k
test: address timing issues in simple http tests #9432
Conversation
I had discussed this with Gireesh ealier. lgtm. |
+1. LGTM |
Thanks for the excellent write up, and for improving the reliability of our tests! Frankly, I'm of the opinion that tests that have their own "exit early" timers should have those timers removed. So from that standpoint I would have you remove that timer. The test runner already has a timer and it should be the authority on nuking a test that's exceeding the timeout value. If you want you can remove this timer now, or someone can come back later and do that, I'm fine either way. Otherwise LGTM. This does raise an important point, and I will file it as a separate issue, but we clearly have a need for (at least the test suite) to verify that our core data structures have indeed transitioned through their state machines. Arguably this we would want this at run time as well. |
@tjfontaine, thanks for the comments and suggestion - I have removed the timer code. |
Ok. Given the approvals above I'll pull this in. |
Gireesh there are failing tests, I can see the current pull request that you missed one reference to timeout. Please update and validate that the modified test passes. |
Gireesh you'll also have to squash into 1 commit using git rebase -i |
I did some mess-up with the rebase, let me correct those. |
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine
@mdawsonibm, squashed the commits into one (7b07e9f), re-ran the tests successfully. Thanks. |
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine Reviewed-By: Michael Dawson <[email protected]> PR-URL: #9432
Landed as 13e1131 |
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine Reviewed-By: Michael Dawson <[email protected]> PR-URL: nodejs/node-v0.x-archive#9432 PORT-FROM: joyent/node @ 13e1131 PR-URL: nodejs#2294 Reviewed-By: Fedor Indutny <[email protected]>
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine Reviewed-By: Michael Dawson <[email protected]> PR-URL: nodejs/node-v0.x-archive#9432 PORT-FROM: joyent/node @ 13e1131 PR-URL: #2294 Reviewed-By: Fedor Indutny <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine Reviewed-By: Michael Dawson <[email protected]> PR-URL: nodejs/node-v0.x-archive#9432 PORT-FROM: joyent/node @ 13e1131 PR-URL: nodejs#2294 Reviewed-By: Fedor Indutny <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
simple tests test-http-request-end.js, test-http-default-encoding.js hangs in AIX. The root cause for both the failures is related to the timing with which packets are sent between the client and server. On the client side, one factor that affects the timing is Nagle's algorithm. With Nagle enabled there may be a delay between two packets as the stack may wait until either: a. An acknowledgement for the first packet is received, or b. 200 ms elapses. before sending the second packet. Similarly at the server side 2 sequential packages can be delivered to the application either together or separatly. On AIX we see that they are delivered separately to the server, while on Linux delivered together. If we change the timing, for example disabling Nagle on AIX we see the 2 packets delivered together and the tests pass. In the test case simple/test-http-request-end.js, the client request handler of the server receives and stores the data in a data callback, closes the server in a request end callback, and writes to the client and ends the response, in-line with the request receipt. An HTTP parser module parses the incoming message, and invokes callback routines which are registered for HTTP events (such as header, body, end etc.) Because the termination sequence arrive in a separate packet, there is a delay in parsing that message and identify that the client request ended (and thereby invoke the request end call backhandler). Due to this delay, the response close happens first, which in-turn destroys the server socket leading to the fd and watcher removal from the uv loop abandoning further events on this connection, and end call back never being called, causing the reported hang. simple/test-http-default-encoding.js suffers from the same problem. Also, remove the timer logic from the test case. Test harness anyways contain a timer which controls the individual tests so remove such controls from the test case, as suggested by @tjfontaine Reviewed-By: Michael Dawson <[email protected]> PR-URL: nodejs/node-v0.x-archive#9432 PORT-FROM: joyent/node @ 13e1131 PR-URL: #2294 Reviewed-By: Fedor Indutny <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]>
simple tests test-http-request-end.js, test-http-default-encoding.js
hangs in AIX. The root cause for both the failures is related to the
timing with which packets are sent between the client and server.
On the client side, one factor that affects the timing is Nagle's
algorithm. With Nagle enabled there may be a delay between two packets
as the stack may wait until either:
a. An acknowledgement for the first packet is received, or
b. 200 ms elapses.
before sending the second packet.
Similarly at the server side 2 sequential packages can be delivered to
the application either together or separately.
On AIX we see that they are delivered separately to the server, while on
Linux delivered together. If we change the timing, for example disabling
Nagle on AIX we see the 2 packets delivered together and the tests pass.
In the test case simple/test-http-request-end.js, the client request
handler of the server receives and stores the data in a data callback,
closes the server in a request end callback, and writes to the client
and ends the response, in-line with the request receipt. An HTTP parser
module parses the incoming message, and invokes callback routines which
are registered for HTTP events (such as header, body, end etc.)
Because the termination sequence arrive in a separate packet, there is a
delay in parsing that message and identify that the client request ended
(and thereby invoke the request end call back handler). Due to this delay,
the response close happens first, which in-turn destroys the server
socket leading to the fd and watcher removal from the uv loop abandoning
further events on this connection, and end call back never being called,
causing the reported hang. simple/test-http-default-encoding.js suffers
from the same problem.
This pull request is to make the test case behave consistently,
irrespective of the timing window between the execution of asynchronous
call back and the synchronous termination of response, as well as
irrespective of the characteristics of the packet reception - coalesced
or in separate packets.
Adding a constant delay (such as 200 ms) before ending the server
response resolves he problem, but for program correctness and
consistency, the below approach is proposed - the request end callback
is a place where the request end is guaranteed, and is safe to write
back to the client.