-
Notifications
You must be signed in to change notification settings - Fork 574
Request for Comment - Move starter proxy responsibilities into proxy set. #527
Conversation
@pearj do you mind taking a quick look at this change? |
This relates to #441 and should probably fix most/all of the problems there. |
scripts/zalenium.sh
Outdated
@@ -21,6 +21,7 @@ START_TUNNEL=${START_TUNNEL:-false} | |||
DEBUG_ENABLED=${DEBUG_ENABLED:-false} | |||
KEEP_ONLY_FAILED_TESTS=${KEEP_ONLY_FAILED_TESTS:-false} | |||
LOG_JSON=${LOG_JSON:-false} | |||
NEW_SESSION_WAIT_TIMEOUT=${NEW_SESSION_WAIT_TIMEOUT:-300000} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate line of row 26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
import java.util.concurrent.atomic.AtomicBoolean; | ||
import java.util.concurrent.atomic.AtomicLong; | ||
import java.util.logging.Level; | ||
import java.util.logging.Logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think we want to use slf4j. @dennisgranath recently made changes to start using it instead of java.util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
@@ -44,6 +44,7 @@ | |||
|
|||
public class SauceLabsRemoteProxyTest { | |||
|
|||
private DockerSeleniumRemoteProxy proxy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation it seems :)
Clean up formatting and remove unused imports.
When starting zalenium on docker using I get following failure when starting up the second container:
|
Thanks @dennisgranath - I’ll run on docker and fix. |
Codecov Report
@@ Coverage Diff @@
## merge-527 #527 +/- ##
================================================
- Coverage 80.95% 59.53% -21.42%
+ Complexity 585 455 -130
================================================
Files 32 35 +3
Lines 2688 2921 +233
Branches 233 246 +13
================================================
- Hits 2176 1739 -437
- Misses 372 1047 +675
+ Partials 140 135 -5 |
docker/Dockerfile
Outdated
@@ -318,11 +318,7 @@ RUN cd /tmp \ | |||
# TestingBot Tunneling # | |||
# -----------------------# | |||
# https://testingbot.com/support/other/tunnel | |||
ENV TB_TUNNEL_URL="https://testingbot.com/tunnel/testingbot-tunnel.jar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should leave this in there. You just removed it because we doesn't work at the client site right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'll revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could probably add: ARG testingBotEnabled=true
on the line beforehand. Then you could add [ "${testingBotEnabled}" == "true" ] && \
before the wget line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a pull request to restore this bit with a build arg:
dspasojevic#3
@dennisgranath I've updated the |
Thanks @dspasojevic I will run them a soon as I can. Might be able to do it later today. |
@@ -48,22 +51,6 @@ public boolean matches(Map<String, Object> nodeCapability, Map<String, Object> r | |||
return false; | |||
} | |||
|
|||
// We do this because the starter node does not have the browser versions when Zalenium starts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diemol Do you know if this code is necessary to not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no, the idea of it is to reject as fast as possible a test request with a browser version we don't provide.
We need to test what happens if I want to run a test with Chrome version 60, and we provide version 66. The request should be rejected since it should not match any nodes.
In the end, this is a minor thing since we can find ways to tweak it.
@@ -31,7 +32,7 @@ public boolean matches(Map<String, Object> nodeCapability, Map<String, Object> r | |||
nodeCapability)); | |||
|
|||
for (RemoteProxy remoteProxy : proxy.getRegistry().getAllProxies()) { | |||
if ((remoteProxy instanceof DockerSeleniumStarterRemoteProxy) && | |||
if ((remoteProxy instanceof DockerSeleniumRemoteProxy) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dspasojevic I presume you moved some code from the starter proxy to the regular one? That's why you have a different instance type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I think that the effect is different. This should be a check to see if we could hypothetically start a container to handle the request. My change has made it check what we could actually handle with the running containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what must be duplicated here: https://github.com/zalando/zalenium/blob/master/src/main/java/de/zalando/ep/zalenium/proxy/DockerSeleniumStarterRemoteProxy.java#L189
Stop cleanup happening for idle outside of a test session
Ok now it seems to work when using plain docker however I get following exception when running my selenium tests (the selenium tests are passing though):
|
@dennisgranath thanks for testing again. It looks like I just need to handle that exception while shutting down the container. Part of the change is it to always attempt to shutdown the container when the proxy is removed to ensure that orphan, untracked containers aren't left around. I think that zalenium itself would have been fine, and would have spun up another container if necessary. |
@dennisgranath I've pushed up a change to specifically handle the |
Yes @pearj I saw that while checking the selenium code for healtcheck. We will remove status checking and only look for 200 for now. |
} | ||
|
||
private void dumpStatus() { | ||
if (LOGGER.isInfoEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are logging into json and sending logs to an ELK stack. The ascii table is pretty neat for debugging but screws up things in kibana. Can we log this on debug or trace perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or feature hide it somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Though might move to be a separate log category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate category for the table. It is off by default in prod, on when running tests.
👍 |
@@ -1,459 +1,414 @@ | |||
package de.zalando.ep.zalenium.proxy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dspasojevic Any reason why this isn't deleted altogether yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was waiting to see if there was something that could be salvaged from here. I'll look at replacing with higher value integration tests.
new JMXHelper().unregister(objectName); | ||
ContainerFactory.setContainerClientGenerator(originalContainerClient); | ||
} | ||
// private GridRegistry registry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dspasojevic What's the plan for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will see whether I can rebirth this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this test was written before the registry implementation could be swapped out. I'll see if I can rewrite to use a simpler grid impl so the docker container client doesn't have to be mocked.
Does that sound sensible to you @diemol ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-enabled.
|
||
assertThat(statusCaptor.getValue(), equalTo(403)); | ||
} | ||
// private GridRegistry registry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dspasojevic and this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-enabled.
👍 |
@@ -17,6 +17,9 @@ | |||
<!-- ALL TRACE DEBUG INFO WARN ERROR OFF --> | |||
<logger name="com.spotify.docker" level="WARN" /> | |||
<logger name="org.seleniumhq.jetty9" level="WARN" /> | |||
|
|||
<!-- SET TO DEBUG TO GET STATUS LOGGING --> | |||
<logger name="de.zalando.ep.zalenium.proxy.AutoStartProxySet.Status" level="INFO" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set log level for this explicitly? The log level is set to logback.loglevel (defaulted to info) and can be set to debug using --debugEnabled flag to zalenium.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't necessary. In the past, I've explicitly (though unnecessarily) set debug levels to make it obvious to others how to turn on specific types of logging.
Happy to remove if that is the consensus :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder when this logging is most useful? For people new to zalenium? Which makes me think that it should be opt-out maybe? So default it to on, but have an environment variable that can turn it off for people who don't want it? Or maybe it's just too much noise to have it on by default?
I don't know 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When JSON logging is isn't enabled this should be on as default is my opinion
👍 |
* Also make it possible to disable testing bot via a buildArg, this is to allow builds on sites where testing bot is blocked due to organisational Internet policies.
Restore testing bot
@@ -0,0 +1,525 @@ | |||
|
|||
// Licensed to the Software Freedom Conservancy (SFC) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be removed, since Zalenium is not licensed to the SFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -147,7 +170,17 @@ public HtmlRenderer getHtmlRender() { | |||
Incrementing the number of tests that will be executed when the session is assigned. | |||
*/ | |||
@Override | |||
public TestSession getNewSession(Map<String, Object> requestedCapability) { | |||
public synchronized TestSession getNewSession(Map<String, Object> requestedCapability) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if synchronized
is needed, since each proxy should get its own class instance. Perhaps I am missing something? @dspasojevic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that the thread marking instances as timed out (which will start the container shutdown process) doesn't conflict with the thread assigning sessions to the proxy. We could make the locking finer grained - e.g. use a lock when checking and updating the timed out and last command flags, but the throughput here didn't seem to warrant it.
} | ||
|
||
/* | ||
Method to terminate an idle session via the registry, the code works because each one has only one slot | ||
We use BROWSER_TIMEOUT as a reason, but this could be changed in the future to show a more clear reason | ||
*/ | ||
@VisibleForTesting | ||
protected synchronized void terminateIdleTest() { | ||
protected void terminateIdleTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this was not really needed since each proxy has its own thread and only one test runs per proxy. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change removes that thread per proxy, there is a fixed threadpool of size 5 that handles terminations now:
LOGGER.info(String.format("Test session started with internal key %s and external key %s assigned to remote %s.", | ||
session.getInternalKey(), | ||
externalKey.getKey(), | ||
session.getExternalKey().getKey(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it like that initially, but sometimes the browser does not start for some reason and the whole thing crashes. In those cases session.getExternalKey()
is null and the real exception is not sent back to the client.
As a matter of fact, I just bumped into it
22:04:16.497 [Terminate Test Session int id: [dccfea44-3436-4b26-ab35-dd9faf2311c6] ext id: [No external key was assigned] container: [f85d9cc697a6a4ff528d5930dbfed124fe856f9998295a03126831a60293c949]] INFO d.z.e.z.p.DockerSeleniumRemoteProxy - f85d9cc697a6a4ff528d5930dbfed124fe856f9998295a03126831a60293c949 AFTER_SESSION command received. Node should shutdown soon...
22:04:16.500 [qtp1046665075-39] INFO o.o.g.w.s.handler.RequestHandler - Error forwarding the new session null
java.lang.NullPointerException: null
at de.zalando.ep.zalenium.proxy.DockerSeleniumRemoteProxy.afterCommand(DockerSeleniumRemoteProxy.java:320)
at org.openqa.grid.internal.TestSession.forward(TestSession.java:281)
at org.openqa.grid.web.servlet.handler.RequestHandler.forwardNewSessionRequestAndUpdateRegistry(RequestHandler.java:91)
at org.openqa.grid.web.servlet.handler.RequestHandler.process(RequestHandler.java:114)
at org.openqa.grid.web.servlet.DriverServlet.process(DriverServlet.java:86)
at org.openqa.grid.web.servlet.DriverServlet.doPost(DriverServlet.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
Could you please revert it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private String generateContainerName(String zaleniumContainerName, | ||
String nodePort) { | ||
return String.format("%s_%s", zaleniumContainerName, nodePort); | ||
final String suffix = RandomStringUtils.randomAlphanumeric(6); | ||
return String.format("%s_%s_%s", zaleniumContainerName, nodePort, suffix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is always using port 40000 (which makes sense since we are not mapping that to the host anymore), and then a random string.
I think we should just remove the 40000 from the name and leave the random part.
Actually it would be cool to have a random name instead of the random string, to simplify debugging and checking logs, but that's just a nice to have - we can do that in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
https://github.com/igr/nomen-est-omen looks like a simple drop in for a subsequent PR.
Thanks @dspasojevic for this PR! Also thanks to @pearj, @dennisgranath and @peerster for the review and testing! Overall, I really like what the PR is doing and goes in line with what we have discussed, I think it'll help a lot and solve other created issues. We would really appreciate a couple of things:
So, the proposal is the following:
After merging and releasing this, I guess some users will create issues since we are changing slightly the behaviour of some parts (like the Let's target to release this in 1 or 2 weeks tops (hopefully earlier), what do you think? |
Hi @diemol. Thanks for the feedback, I've actioned most of the changes and commented on the ones that I haven't. Codacy seems happy again. There are two tests disabled at the moment. I'll see if I can fix them. I do wonder if more integration tests would be better - most of the unit tests that I've looked at seem to use a lot of mocking which (sometimes?) means that the tests are not as effective as they could be. wdyt? |
Let's merge to a protected branch and we can take it from there, like that we can run Travis whole CI and get a better idea. |
Description
Removes the remote starter proxy and moves its responsibilities into a custom ProxySet.
This PR is for comment only - I don't expect it to be merged in current incarnation.
Motivation and Context
Permits closer tracking of starting, started and closing containers / proxies. Prevents creation of duplicates and orphan containers.
How Has This Been Tested?
Tested in an OpenShift environment. Requires more testing.
Types of changes
Checklist: