-
-
Notifications
You must be signed in to change notification settings - Fork 5
ci: increase e2e startup polling delay #3483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3483 +/- ##
=======================================
Coverage 82.18% 82.18%
=======================================
Files 611 611
Lines 36449 36449
Branches 6005 6005
=======================================
Hits 29957 29957
Misses 5623 5623
Partials 869 869 ☔ View full report in Codecov by Sentry. |
It bothers me to see all the error messages. My workstation can start it in 15 s. GHA can start it in 30 s.
808c876
to
d0a9b97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmachapman reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts
line 7 at r1 (raw file):
const pollUrl = 'http://localhost:5000/projects'; const pollInterval = 17000;
Wouldn't it be better to poll every second, but reduce the error logging to only log those failures that occur after the 30 second mark? This will help reduce the error messages without a non-standard interval number (i.e. when some looks at 17000 in future, they won't realize it is based on your PC's and the GHA's performance).
Including an exponential backoff for the interval, i.e. 1000, 2000, 4000, 8000, 16000, is also a usual practice for polling non-responsive services.
Code quote:
const pollInterval = 17000;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @marksvc)
-- commits
line 4 at r1:
I don't see what's wrong with the current approach. I can't think of a reason to run this locally, and usually if you're looking at it on GHA, it's because the e2e tests are broken, and we're going to have to re-run them. This will often add an unnecessary delay to running the tests, which we'd ideally like to optimize to run much faster.
0:00 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111) 0:01 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111) 0:02 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111) 0:03 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111) 0:04 Startup check failed: error sending request for url (http://localhost:5000/projects): client error (Connect): tcp connect error: Connection refused (os error 111) 33 times. Once when I was still learning and checked the e2e logs early on I incorrectly thought the e2e tests never even successfully started because of all these messages. The little message "Startup check passed. Exiting." at the end is a squeak compared to all the errors :) The other problem is seeing it on my workstation, which is more often. On my local computer I start stuff all the time: compile job, launch a program, start some tests. It will give an indication of starting up, and eventually it's going. But it doesn't say "Error" a dozen times before successfully starting up :-) It would be an improvement if every second we printed a message saying "Still waiting" 33 times.
I often run it locally. If the GHA job fails, I might be able to more accurately try what it is doing by running the same script.
I know the e2e.mts script has various configuration etc, but it's been very convenient to just run the bash script that handles it all.
Can you clarify why this might mean the display of 33 error messages at the beginning of the test run isn't undesirable?
Yes, I see what you're saying here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @pmachapman)
-- commits
line 4 at r1:
It seems like the issue has more to do with how the startup ping failures appear in the log, than their frequency or presence. Perhaps the message could be changed to just say it's not up yet. It's verbose because it's intended to run on the CI.
The canonical way of running the tests, defined in the README, is to run e2e.mts
. pre_merge_ci.sh
is intended as a wrapper around that for the sake of a CI. I'm not even sure if developers on Windows can run it.
- It does stuff in the GUI and sometimes I am concerned that I might have messed it up with my mouse or keyboard.
This is a good argument for adding a new preset called headless
that's identical to the default, except for being headless. I usually run them in headed mode because I want to be able to easily investigate failures, and headless mode makes it nearly impossible.
- I need to be more coordinated with SF running locally in the background, since e2e.mts seems to test an already running SF.
I nearly always have SF running for development purposes, and shutting down the processes to start another is quite slow. Do you not have it running most of the time? I guess I've just assumed it would be difficult to work without SF running.
Can you clarify why this might mean the display of 33 error messages at the beginning of the test run isn't undesirable?
I see them as startup status messages, in a CI script that's intended to be as verbose as possible. Arguably maybe the error messages could be toned down to just say that it can't connect yet instead of the underlying error message.
src/SIL.XForge.Scripture/ClientApp/e2e/await-application-startup.mts
line 7 at r1 (raw file):
I don't really want an exponential backoff, since this isn't about a service failure, but just waiting for a service to come up. And the cost of the requests is very low, since it's all on localhost.
reduce the error logging to only log those failures that occur after the 30 second
Taking more than 30 seconds wouldn't be particularly abnormal. In my opinion showing network failure messages after an arbitrary time period would make the problem of status messages looking like critical failures worse, since they normally wouldn't show up, so when they do show up it would deviate from what's normal.
It bothers me to see all the error messages.
My workstation can start it in 15 s. GHA can start it in 30 s.
This change is