Selenium Gird Scaler | Edge active sessions not being properly counted #2709

Wolfe1 · 2022-03-02T20:34:06Z

Report

Been working on getting a selenium grid 4 deployment working in AKS using Keda to scale. Chrome and Firefox have been no problem but scaling Edge browser pods seems to have a bug.

Currently using the following scaled-object:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: selenium-edge-scaledobject
  namespace: legion-selenium-grid
  labels:
    deploymentName: selenium-edge-node-deployment
spec:
  minReplicaCount: 0
  maxReplicaCount: 80
  scaleTargetRef:
    name: selenium-edge-node-deployment
  triggers:
    - type: selenium-grid
      metadata:
        url: 'https://selenium-grid-qe-tools.accruentsystems.com/graphql'
        browserName: 'MicrosoftEdge'

New sessions in the queue appear as 'MicrosoftEdge' and scale up just fine:

However, taking a look at the active sessions I noticed that the browserName on the active sessions becomes 'msedge':

Thus when it comes time for Keda to check for a scale-down event it sees that there are no active sessions and proceeds scale to 0.

Expected Behavior

Keda detects that Edge pods are still in use and scales down gradually as needed.

Actual Behavior

When scaling down Keda will remove all Edge pods regardless of current active sessions.

Steps to Reproduce the Problem

Deploy selenium grid and Edge deployments
Apply scaled-object for edge (see description)
Can be done in a variety of ways but an easy way: Run a single long (10 minute +) test on selenium grid. Grid will scale up to allow your test to run but will eventually scale to 0 before your test can finish.

Logs from KEDA operator

example

KEDA Version

2.6.1

Kubernetes Version

1.21

Platform

Microsoft Azure

Scaler Details

Selenium Grid

Anything else?

I am not in any way familiar with Go and am having issues deploying a local change to AKS but I believe all that really needs done here is a specific check for "MicrosoftEdge" as a browserName and then simply use the session-specific naming of 'msedge' in that situation.

Probably something like this (though im sure there is a more elegant way to put it). Around line 225 in selenium_grid_scaler:

var sessions = seleniumResponse.Data.SessionsInfo.Sessions
	for _, session := range sessions {
		var capability = capability{}
		var browserCapName = ""
		if browserName == "MicrosoftEdge" {
			browserCapName = "msedge"
		} else {
			browserCapName = browserName
		}
		if err := json.Unmarshal([]byte(session.Capabilities), &capability); err == nil {
			if capability.BrowserName == browserCapName {
				if strings.HasPrefix(capability.BrowserVersion, browserVersion) {
					count++
				} else if browserVersion == DefaultBrowserVersion {
					count++
				}
			}
		} else {
			seleniumGridLog.Error(err, fmt.Sprintf("Error when unmarshaling sessions info: %s", err))
		}
	}

The text was updated successfully, but these errors were encountered:

Wolfe1 · 2022-03-08T14:59:00Z

Testing this a bit further and it seems that this limits the scale-up as well.

For example, take this testing scenario:

Scaled object allows up to 20 edge pods at once, normal scale-up behavior.
I have 10 Edge pods, each currently running a test.
I have 5 sessions for edge waiting in the session queue.

What we would expect to happen here is that we would scale up to 15 pods in order to allow the 5 tests in the queue to start. In actuality, nothing happens. I believe this is also because of the edge sessions having a different browserName when they are in an active session. Keda knows there are 10 pods up but does not see any active sessions under the name "MicrosoftEdge" so it assumes the sessions in the queue will be picked up soon and thus does not scale.

JorTurFer · 2022-03-09T11:03:32Z

Hi,
I'm not an expert but this seems an error in the upstream, I mean. KEDA recovers those values and applies the logic in base of them, but I guess that the consistency between them is part of the upstream responsibility.

Maybe I'm missing something but I think that if we apply that patch and then the upstream uses the same name in both cases, KEDA will fail again.

thoughts @kedacore/keda-core-contributors ?

Wolfe1 · 2022-03-09T16:23:18Z

@JorTurFer After talking with some contributors for selenium it seems that the edgeDriver accepts a few values as valid (having trouble finding any sort of a list though) when starting but will in the end self identify as 'msedge' every time:

It would seem in this case that the driver session would never report anything other than "msedge".

Modifying my session call to appear as "msedge" gets it in the session queue but my current hurdle is getting the Edge node to accept the "msedge" browserName stereotype as something to pick up on the node.

Still working on this so there may be a way to get this working as-is. Even though this is an upstream responsibility I think a small modification in one/two file(s) here would be much simpler than large code changes across multiple projects and tools.

Perhaps this could be an optional parameter such as sessionBrowserName that if set is used for session identification, otherwise ignored and browserName is used as normal? This would allow flexibility outside of this case for edge as well.

JorTurFer · 2022-03-10T11:38:34Z

I agree with that optional parameter. Maybe explaining it well in the docs we could fill the gap and don't be attached to future changes in their side because it's just remove the value for the optional parameter (but this edge case should be documented IMO)
Let's give some time to other @kedacore/keda-core-contributors for sharing their thoughts?

Wolfe1 · 2022-04-12T18:42:10Z

@JorTurFer Is there a tag needed for this to be discussed or have someone chime in?

zroubalik · 2022-04-13T05:57:04Z

An optional parameter seems like the best solution for this, together with a proper documentation. Anyone up for a contribution?

Wolfe1 · 2022-05-20T15:29:29Z

@zroubalik I have some time so taking a stab at this now.

Wolfe1 · 2022-05-23T18:45:30Z

I have a working solution for this but ran into a bit of a snag: #3061

Going to see about fixing that as well in this as it should be fairly simple.

Signed-off-by: Brandon Wolfe <[email protected]>

…and #3061 (#3062)

Wolfe1 added the bug Something isn't working label Mar 2, 2022

tomkerkhove added this to Roadmap - KEDA Core Mar 2, 2022

tomkerkhove moved this to Proposed in Roadmap - KEDA Core Mar 2, 2022

Wolfe1 mentioned this issue Mar 25, 2022

[🐛 Bug]: Unable to change node stereotype SeleniumHQ/docker-selenium#1523

Closed

Wolfe1 mentioned this issue May 23, 2022

Selenium Grid Scaler | Max Sessions implementation issue #3061

Closed

Wolfe1 added a commit to Wolfe1/keda that referenced this issue May 23, 2022

Changelog edits for kedacore#2709 and kedacore#3061

41f2871

Signed-off-by: Brandon Wolfe <[email protected]>

This was referenced May 23, 2022

sessionBrowserName doc information and example for Edge kedacore/keda-docs#764

Merged

Implementation of sessionBrowserName and a fix for maxSessions (#2709 and #3061 #3062

Merged

JorTurFer closed this as completed in #3062 May 25, 2022

JorTurFer pushed a commit that referenced this issue May 25, 2022

Implementation of sessionBrowserName and a fix for maxSessions (#2709 …

d5f857c

…and #3061 (#3062)

Repository owner moved this from Proposed to Ready To Ship in Roadmap - KEDA Core May 25, 2022

tomkerkhove moved this from Ready To Ship to Done in Roadmap - KEDA Core Aug 3, 2022

Wolfe1 mentioned this issue Sep 29, 2022

Selenium Grid Scaler | Multiple browser nodes are not scaling parallel #3620

Closed

Wolfe1 mentioned this issue Nov 15, 2022

Autoscaling selenium grid on kubernetes SeleniumHQ/docker-selenium#1714

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selenium Gird Scaler | Edge active sessions not being properly counted #2709

Selenium Gird Scaler | Edge active sessions not being properly counted #2709

Wolfe1 commented Mar 2, 2022

Wolfe1 commented Mar 8, 2022

JorTurFer commented Mar 9, 2022

Wolfe1 commented Mar 9, 2022

JorTurFer commented Mar 10, 2022 •

edited

Loading

Wolfe1 commented Apr 12, 2022

zroubalik commented Apr 13, 2022 •

edited

Loading

Wolfe1 commented May 20, 2022

Wolfe1 commented May 23, 2022

Selenium Gird Scaler | Edge active sessions not being properly counted #2709

Selenium Gird Scaler | Edge active sessions not being properly counted #2709

Comments

Wolfe1 commented Mar 2, 2022

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

Wolfe1 commented Mar 8, 2022

JorTurFer commented Mar 9, 2022

Wolfe1 commented Mar 9, 2022

JorTurFer commented Mar 10, 2022 • edited Loading

Wolfe1 commented Apr 12, 2022

zroubalik commented Apr 13, 2022 • edited Loading

Wolfe1 commented May 20, 2022

Wolfe1 commented May 23, 2022

JorTurFer commented Mar 10, 2022 •

edited

Loading

zroubalik commented Apr 13, 2022 •

edited

Loading