-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout and retry logic for session creation #1232
Conversation
trying to fix the broken tests:
These seem totally unrelated. I'm having a hard time figuring out how this is related to retrying the session. Same for this other failing tests:
|
The only test failing is "test session response with status success and no sessionId". The Mocha test failures are intentional. I would suggest to only add the timeout handler and leave the retry out for now. |
I took another stab at it, and got all of the tests passing. Re: only merging the timeout: Only merging the timeout logic would only solve half of the bug I'm trying to solve. I would write a test, but it only ever happens on our private travis ci. In order to originally investigate it I did something like this: chromedriver.sh
The curl health checks to chromedriver / selenium would always return alive, and the chromedriver would always note the port was busy (e.g. it was alive), and yet nightwatch's request would hang until timeout. This would happen on travis 1 to 3 times per 30 suites every CI run. Thanks for considering! |
I'm also having the same problem. Can we please merge this ASAP? |
@NickStefan , Do you think your PR will fix https://github.com/nightwatchjs/nightwatch/issues/1198 ? |
does this code cover cases where you run selenium manually separate from nightwatch? ie you do not let nightwatch start selenium? I'm seeing the same issue where tests are hanging when running selenium separate from nightwatch |
@naissa12 this pr takes the existing nightwatch request logic and adds timeout + retry. So it should not make a difference whether you or nightwatch spawns selenium. You could try this PR out just to see. @silverrajat88 it may fix that issue. Assuming you npm install nightwatch, you can do a temporary hack to test this PR for your problem by pointing your package.json to this PR's commit:
|
@beatfactor I am struggling with the same problem... may we ask you to merge this asap? |
also related with #1141 |
Yes, this is the same problem as reported in #1141, and I think it should be implemented as discussed there (with a config setting and a default higher than 15s). I am still not in favour of the retry being part of this PR, especially that it doesn't have tests. I think the retry can be dealt with in a separate fix. |
@beatfactor Well, I think its fair to say though that people have confirmed that retying will need to be a part of some solution, right? Simply timing out on timeout, and then failing the tests, isn't a great default. Its forcing consumers of this library to care about low level http issues between test runner and selenium. Those issues really aren't a part of their own tests failing. So I see that the linked related issue #1141 is dated from August, and is for Circle CI (?). Is there a related fix already in development that this PR is stepping on? If this PR can't be merged as is, can you give some guidance on what sort of solution would be acceptable?
|
@beatfactor , @NickStefan I created this PR: #1253 adding the configurable settings for timeout prop. @beatfactor I do believe retries should be part of this solution so that we solve the continues integration issues we are facing. @NickStefan sorry to create another PR feel free to bring anything you find useful from that one to yours and we can continue with this one. Best |
@beatfactor I am still committed to getting some tests for this, but I did go ahead and pull out the number of retry attempts and timeouts into configurable options. (added some new code and commits!) It would work like
If its not passed, the default is a timeout of 60000, and the number of attempts is defaulted to 1 (e.g. no retries). Is this more along the lines of what you're looking for? @diegoprd No worries; I was already working on this update while you were working on yours :). |
@NickStefan yeah, it seems to be heading into the right direction. For the options, I would suggest changing "attempts" into "retry_attempts" or "retries". |
@@ -165,6 +169,10 @@ module.exports = (function() { | |||
self.emit('error', {}, response); | |||
}); | |||
|
|||
this.request.setTimeout(this.timeoutDuration || 60000, function(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could/should we put the default value at the top of request.js as part of the settings (line 9)?
and then we can make the setTimeout
change the settings like the proxy is set up for example:
HttpRequest.setProxy = function(proxy) {
Settings.proxy = proxy;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegoprd Agreed. I went ahead and implemented this for the timeout. in a latest commit
data = ''; | ||
} | ||
var attempts = (this.options.request_timeout_options || {}).attempts || 1; | ||
var timeout = (this.options.request_timeout_options || {}).duration || 60000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we are setting the 60000
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone in the latest
So I was able to pull the timeout defaults and configuring deep into the http/request file. Still thinking about how (or if) I should pull the retry logic. Right now it only retries at the startSession in lib/index. Adding a retry to the http/request is a little tricky with the use of streams. Going to poke around a bit more... |
So all of the ugly new code I had added to Docs PR: nightwatchjs/nightwatch-docs#60 All of the existing tests pass. I've been manually testing the retry logic against my company's CI. Tests for this repo (to prove the timeout and retry logic) will have to wait until tomorrow. |
great job @NickStefan ! |
We're on an old version of Otherwise, this PR is ready @beatfactor |
Could you also squash your commits? |
@@ -4,6 +4,7 @@ var Logger = common.require('util/logger'); | |||
var nock = require('nock'); | |||
var assert = require('assert'); | |||
var mockery = require('mockery'); | |||
var once = require('lodash.once'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this dependency be avoided? In any case it should be added to devDependencies.
var seleniumHost = this.options.seleniumHost || this.options.selenium_host; | ||
var useSSL = this.options.useSsl || this.options.use_ssl; | ||
var proxy = this.options.proxy; | ||
var timeoutOptions = (this.options.request_timeout_options || {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary brackets.
@@ -165,6 +169,16 @@ module.exports = (function() { | |||
self.emit('error', {}, response); | |||
}); | |||
|
|||
this.request.setTimeout(this.timeout, function(){ | |||
if (self.retryAttempts) { | |||
console.log('retrying...'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the Logger here.
sessionId: '123456' | ||
}; | ||
|
||
HttpRequest.setTimeout(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could do with a smaller timeout here (e.g. 100ms)? One full second adds quite a bit to the tests run.
What is making the latest nock version incompatible with our tests? And also you need to edit the commit message to be something descriptive. |
We could point to a fork of nock if necessary, but maybe they might still merge your PR? There's still the issue with the 1s timeout and the use of lodash.once that I'd like we address. @DoHe it would be quite useful I think if you could get a verbose log of the retry problem and even a selenium debug log. Do you think that's possible? |
@beatfactor Of course. I created a gist with both logs here. At the end of the logs you can also see what happens when I kill the selenium server after the tests freeze. I have |
In the documentation it says on a time out we have to close the socket manually. I'll try to write a fix for that myself. |
@NickStefan I opened a PR against your branch that fixed that problem for me. |
@DoHe is the problem of the nightwatch runner hanging at the end also fixed but your PR? |
@beatfactor It's the only thing it fixes (in comparison to this PR) |
The Nightwatch.startSession http action often times out despite the selenium server being totally responsive. This PR adds timeout and retry parameters to the http module, allowing the tests to continue despite a failed session attempt.
@DoHe I added the @beatfactor removed the lodash.once dependency. also set the timeout in the testing to much lower than 1000ms (200ms). |
@NickStefan it might be nicer to add a dummy |
Hey nick I tried your code but the tests are still hanging when trying to request session from selenium. What's the best way to handle? |
@naissa12 did you set the options? I made a PR to the docs nightwatchjs/nightwatch-docs#60
retry attempts defaults to 0, so you need to set the options to get this PR to do anything. |
There hasn't been any movement on this PR in a while, so I thought I'd mention that I was pursuing what I think is the same problem but from the Chrome side. I believe I've proved that this failure to get sessions began with Chrome 53 (Linux only), but I'd be interested to know if anyone has other data.
|
@joel-airspring I ran your script on my setup where I experienced problems with nightwatch: Ubuntu 16.04.1 LTS on EC2, using xvfb, Chrome version 55.0.2883.75 and chromedriver version 2.25.426924 |
Merged this into master and managed to rewrite the tests without nock: f3bc41c. Please have a look, it will be released soon in npm. |
👏 nice! Good to hear we got around nock. |
Added to v0.9.11. |
@beatfactor I'm not sure your rewritten version is actually doing the same as this PR. Since moving from this branch to 0.9.11., we're getting following error (that we didn't have before): |
Yes, the socket hang up is from aborting the request, which is timed out. The abort is a more suitable way of destroying the socket. Maybe the error message should be improved, but the request should be retried regardless of the error being there or not. Isn't that the case for you? |
It does not look as if it's retrying (I haven't had time to investigate in too much detail yet, but it happens on our productive CI tests quite regularly):
As you can see there is no tally of the number of successful tests.
|
That's odd, retrying should work. How does your config look like? |
{
"src_folders": ["functional"],
"output_folders": "functional/reports",
"live_output" : true,
"selenium": {
"start_process": true,
"server_path": "node_modules/selenium-server-standalone-jar/jar/selenium-server-standalone-3.0.1.jar",
"log_path": "./",
"host:": "127.0.0.1",
"port": 4444,
"cli_args": {
"webdriver.chrome.driver": "node_modules/chromedriver/bin/chromedriver"
}
},
"test_settings": {
"default": {
"selenium_host": "localhost",
"selenium_port": 4444,
"request_timeout_options": {
"timeout": 20000,
"retry_attempts": 10
},
"silent": true,
"screenshots": {
"enabled": false,
"path": ""
},
"globals": {
"waitForConditionTimeout": 10000
},
"end_session_on_fail": true,
"test_workers": true
},
"chrome": {
"desiredCapabilities": {
"browserName": "chrome"
}
}
}
}
|
@DoHe I haven't had a chance to try out v0.9.11 yet. Our Ci is still running the PR / fork. It's on my todo list, but ive got a lot of other features to work on! Currently wrestling with celery and rabbitmq on our backend (my preferred home 😉 ) |
@DoHe Hmm, so it looks like the runner is exiting at the socket hang up error. I'll look into it. |
@DoHe Pushed a fix for the problem you reported, in 0.9.12. Thanks. |
@beatfactor Thanks for that, it works perfectly for us now. |
This never happens locally, but in CI environments, the HTTP request that gets a session between selenium and nightwatch, may hang indefinitely. This was happening to me at least 1 to 3 times across 20 test group every single run on CI.
related:
#1223
Why:
Changes:
lib/http/request
and emit an event called timeout.Nightwatch.startSession
.