-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate flaky test-http-client-race-2 on SmartOS #11026
Comments
I wonder if we should open a single issue for this and all the other first-time SmartOS timeout test failures we've been seeing in the last week or three. #11026 One thing they all seem to have in common: They are all http/tls tests. Totally guessing here, but maybe the test suite is triggering some sort of firewall/intrusion detection on the host causing it to silently drop a connection or something like that? @nodejs/testing @nodejs/platform-smartos @nodejs/http |
@Trott I'll try to take a look at this before the end of the week, if not early next week at the latest. Sorry for the delay :( |
Looking at it now. |
So far, I haven't been able to reproduce the problem described by any of the issues listed above. In order to be able to get more information and investigate future spurious failures, I submitted a PR that sends In the meantime I'll continue trying to reproduce and investigate those issues, I'll keep you posted. |
I haven't noticed any SmartOS failures lately that are similar to these. It's possible these issues have stopped happening on CI. In which case, they may be very difficult to reproduce at this point. |
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. Refs: nodejs#11026
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
#11086 was merged and nodejs/build#613 was created to make tests that time out generate a core file that could be inspected to help root cause these failures. Thus, I'd suggest that we mark this test (and the other flaky ones mentioned in #11026 (comment)) as flaky on SmartOS. Then we should make sure that when a build is marked "unstable" on SmartOS, we grab the core files that are generated and upload them somewhere (we can use Joyent's manta for that, as this is part of the resources donated by Joyent to the project) where they won't be cleaned up for further investigation. How does that sound? |
Thinking about it more, marking these tests as flaky might not be the best way forward. It is possible that these timeouts are due to the environment more than due to the tests themselves or to a bug in node on SmartOS. Making these tests flaky could potentially make actual future failures silent, whereas leaving them non-flaky would make actual future failures trigger a failed build. Since these failures don't seem to be too frequent, it shouldn't have a significant impact on productivity, and we'll be able to use the new Ideally, in the meantime we could also gather more data on the system and the test processes when a test times out by adding more instrumentation to Thus, I think I'm leaning towards closing these issues and using the new instrumentation/adding more instrumentation to help us investigate future failures instead of marking these tests as flaky. |
SGTM. I don't want to mark tests as flaky in general except as an absolutely last resort. In the past, there were so many unreliable tests that we had to mark tests flaky or else we risked people ignoring all CI failures. Things have improved a lot since then. Marking as flaky reduces the visibility of the test failures. I'd rather keep them visible. |
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: nodejs#11086 Ref: nodejs#11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: nodejs#11086 Ref: nodejs#11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: #11086 Ref: #11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: nodejs#11086 Ref: nodejs#11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
Haven't seen this in a long time. Closing. Feel free to re-open or comment if it should be open. |
Currently, when a process times out, it is terminated by sending it the SIGTERM signal. Sending SIGBART instead allows the operating system to generate a core file that can be investigated later using post-mortem debuggers such as llnode or mdb_v8. This can be very useful when investigating flaky tests that time out, since in that case the failure is difficult to reproduce, and being able to look at a core file makes a big difference. With these changes, passing the --abort-on-timeout command line option to tools/test.py now sends SIGABRT to processes timing out on all platforms but Windows. PR-URL: nodejs/node#11086 Ref: nodejs/node#11026 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gibson Fahnestock <[email protected]> Reviewed-By: Sakthipriyan Vairamani <[email protected]> Reviewed-By: Santiago Gimeno <[email protected]> Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Anna Henningsen <[email protected]>
https://ci.nodejs.org/job/node-test-commit-smartos/6613/nodes=smartos16-64/console
The text was updated successfully, but these errors were encountered: