Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Worker node initiates session before video container is fully ready #2386

Closed
yonigolob1 opened this issue Sep 8, 2024 · 5 comments · Fixed by #2387
Closed

[🐛 Bug]: Worker node initiates session before video container is fully ready #2386

yonigolob1 opened this issue Sep 8, 2024 · 5 comments · Fixed by #2387
Milestone

Comments

@yonigolob1
Copy link

What happened?

When running a session on Selenium Grid, if the image of the video container hasn't been pulled yet (for instance, because it's a new node), the video image takes more time to pull than the node worker's container due to its size.
As soon as the worker image is pulled by the node, the container starts, and the session runs.
The issue is that the video container isn't ready yet, which results in videos starting midway through the test or, even worse, the video container continuing to run indefinitely when the test is short enough to finish before the video container is ready, until it's manually terminated.

Command used to start Selenium Grid with Docker (or Kubernetes)

## ========== Helm Values File ==========

browserNodesCommon: &browserNodesCommon
  resources:
    requests:
      memory: "12Gi"
      cpu: "4"
    limits:
      memory: "12Gi"
      cpu: "14"
  deploymentEnabled: false
  dshmVolumeSizeLimit: 1Gi
  terminationGracePeriodSeconds: 1200
  extraEnvironmentVariables:
    - name: SE_SCREEN_WIDTH
      value: "1920"
    - name: SE_SCREEN_HEIGHT
      value: "1080"
    - name: SE_VNC_NO_PASSWORD
      value: "1"

selenium-grid:
  enabled: true

  global:
    seleniumGrid:
      logLevel: "INFO"
      stdoutProbeLog: true
      structuredLogs: true
      revisionHistoryLimit: 3

  basicAuth:
    enabled: false

  isolateComponents: true

  serviceAccount:
    create: true
    nameOverride: "{{ .Release.Name }}"

  ingress:
    className: "traefik"
    annotations:
      traefik.ingress.kubernetes.io/router.tls: "true"
      traefik.ingress.kubernetes.io/router.entrypoints: websecure
      external-dns.alpha.kubernetes.io/ttl: "3600"
      dns.alpha.kubernetes.io/ingress-hostname-source: defined-hosts-only

  serverConfigMap:
    env:
      SE_SUPERVISORD_LOG_LEVEL: "info"

  components:
    # Configuration for router component
    router:
      readinessProbe:
        periodSeconds: 20
      livenessProbe:
        periodSeconds: 20
      resources:
        requests:
          memory: "4Gi"
          cpu: "12"
        limits:
          memory: "4Gi"

    # Configuration for distributor component
    distributor:
      readinessProbe:
        periodSeconds: 20
      livenessProbe:
        periodSeconds: 20
      resources:
        requests:
          memory: "6Gi"
          cpu: "12"
        limits:
          memory: "6Gi"

    # Configuration for Event Bus component
    eventBus:
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "4Gi"

    # Configuration for Session Map component
    sessionMap:
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "4Gi"

    # Configuration for Session Queue component
    sessionQueue:
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "4Gi"

  autoscaling:
    enableWithExistingKEDA: true
    scalingType: job
    annotations:
      "helm.sh/hook": post-install,post-upgrade,post-rollback
      "helm.sh/hook-weight": "1"
    scaledOptions:
      minReplicaCount: 0
      maxReplicaCount: 1000
      pollingInterval: 3
    scaledJobOptions:
      scalingStrategy:
        strategy: default
      successfulJobsHistoryLimit: 0
      failedJobsHistoryLimit: 0
      jobTargetRef:
        parallelism: 1
        completions: 1
        backoffLimit: 0

  chromeNode: *browserNodesCommon
  firefoxNode: *browserNodesCommon
  edgeNode: *browserNodesCommon

  videoRecorder:
    enabled: true
    uploader:
      enabled: true
      destinationPrefix: gs://my-bucket/test

  uploaderConfigMap:
    secretFiles:
      upload.conf: |
        [gs]
        type = google cloud storage
        no_check_bucket = true
        bucket_policy_only = true

Relevant log output

N/A

Operating System

Kubernetes (1.30.2)

Docker Selenium version (image tag)

4.24.0-20240830

Selenium Grid chart version (chart version)

0.35.0

Copy link

github-actions bot commented Sep 8, 2024

@yonigolob1, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@VietND96
Copy link
Member

VietND96 commented Sep 8, 2024

Thanks for your report, this issue is valid. In development env, most the time images are built and ready so we didn't observe this issue.
Will give the fix to guard both images are pulled before starting it.

@yonigolob1
Copy link
Author

Thank you, much appreciated!

@VietND96
Copy link
Member

VietND96 commented Sep 9, 2024

@yonigolob1, the fix is included in https://github.com/SeleniumHQ/docker-selenium/releases/tag/selenium-grid-0.35.2
The idea is to use initContainers to pre-pull images. Please try and share your feedback.

@VietND96 VietND96 added this to the 4.25.0 milestone Sep 22, 2024
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants