fix(ci): wait for code-server in healthcheck#2744
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce flakiness/timeouts in the UI Selenium CI job by ensuring the test container is only considered “ready” once both Selenium Grid and code-server are responsive, and by adding bounded waiting + shorter Selenium timeouts in the Python fixtures.
Changes:
- Update the container healthcheck to validate both
:4444(Selenium) and:8080(code-server), and extend healthcheck timing tolerances. - Fix the UI fixture to stop force-recreating the container every run and replace an unbounded health loop with a deadline-based wait.
- Configure Selenium WebDriver page-load and script timeouts to fail faster.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| test/ui/fixtures/ui_fixtures.py | Makes container startup waiting bounded/configurable and sets WebDriver timeouts. |
| docker-compose.yml | Expands the container healthcheck to include code-server and adjusts healthcheck timing; publishes port 8080. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 5 minutes and 16 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
📝 WalkthroughWalkthroughExpose an additional service port and tighten Docker healthchecks for the Selenium VSCode service; replace an infinite container-health polling loop with bounded timeout-based polling and explicit failures in test fixtures; and add a download timeout configuration to the test runner config. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/ui/fixtures/ui_fixtures.py (1)
114-119: Apply the same fail-fast timeouts tonew_browsertoo.
browser_setupnow gets the shorter page/script timeouts, butnew_browserat Lines 152-155 still uses Selenium's defaults. Keeping driver setup split like this makes the two fixtures drift again.♻️ Minimal follow-up
try: driver = WebDriver( command_executor="http://localhost:4444/wd/hub", options=options ) + driver.set_page_load_timeout(120) + driver.set_script_timeout(60) driver.maximize_window() yield driver, "https://stage.ai.ansible.redhat.com/login", None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/ui/fixtures/ui_fixtures.py` around lines 114 - 119, The new_browser fixture still uses Selenium defaults; after it constructs the WebDriver instance (the driver created in the new_browser fixture) call driver.set_page_load_timeout(120) and driver.set_script_timeout(60) to match browser_setup's timeouts; locate the WebDriver instantiation in new_browser and add these two timeout calls immediately after driver creation (or factor both fixtures to share a helper that applies these timeouts) so both fixtures fail fast consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Line 16: Replace the hard-coded published port mapping "8080:8080" with a
loopback-bound, configurable mapping so the service only listens on localhost
and the host port can be changed; e.g. change the port mapping string to use an
env var and bind to 127.0.0.1 like "127.0.0.1:${CODE_SERVER_PORT:-8080}:8080"
(replace the existing "8080:8080" entry), and update any documentation or env
examples to include CODE_SERVER_PORT if needed.
---
Nitpick comments:
In `@test/ui/fixtures/ui_fixtures.py`:
- Around line 114-119: The new_browser fixture still uses Selenium defaults;
after it constructs the WebDriver instance (the driver created in the
new_browser fixture) call driver.set_page_load_timeout(120) and
driver.set_script_timeout(60) to match browser_setup's timeouts; locate the
WebDriver instantiation in new_browser and add these two timeout calls
immediately after driver creation (or factor both fixtures to share a helper
that applies these timeouts) so both fixtures fail fast consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 13fe9cc5-8d32-4c6e-9033-f44d1fa42cd1
📒 Files selected for processing (2)
docker-compose.ymltest/ui/fixtures/ui_fixtures.py
7e173fb to
79cdc7a
Compare
❌ 5 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
79cdc7a to
9b16937
Compare
9b16937 to
213694d
Compare
cf010c3 to
461e33e
Compare
461e33e to
ef7daff
Compare
The compose healthcheck only validated Selenium Grid (:4444), so the container was marked healthy before code-server (:8080) finished starting. Tests hit the default 5-minute page-load timeout, burning ~6 min per test until the 30-min job limit. - Healthcheck now verifies both :4444 and :8080 inside the container so `podman healthcheck run` blocks until both services are ready. - Remove accidental `or True` that forced container recreate. - Replace unbounded `while True` with a deadline-based loop (300 s default, configurable via UI_CONTAINER_HEALTH_TIMEOUT). - Set page-load and script timeouts on WebDriver for faster failure when code-server is slow. - Expose port 8080 for local debugging. related: ansible#2580 Made-with: Cursor
ef7daff to
f7aace5
Compare
Problem
test (linux)fails with 71% rate (#2580). Three independent issues:Selenium UI tests (healthcheck): Every test raises
TimeoutExceptionbecause the container healthcheck only validates Selenium Grid on:4444— code-server on:8080is still starting when tests begin, causing a 5-minute page-load timeout per test.Selenium UI tests (command palette):
vscode_run_command()uses an XPath click on the command center element to open the command palette. The element is found but clicking it fails silently in code-server, causing a hidden 180s timeout (6 retries x 30sWebDriverWait).E2E tests:
@vscode/test-electrontimes out after 15s while resolving the VS Code version from the CDN. Thedownload.timeoutconfig option was not being forwarded to the underlying download call due to a bug in@vscode/test-cli(download.timeout not passed to downloadAndUnzipVSCode in installDependentExtensions microsoft/vscode-test-cli#106).Fix
Selenium UI tests — healthcheck
curl -sf --max-time 3 http://localhost:4444 && curl -sf --max-time 3 http://localhost:8080.retries(10 → 30) andstart_period(10s → 30s) to accommodate code-server startup time.or Truethat forced container teardown/recreate on every run.while Truehealth loop with a deadline-based loop (360s default, configurable viaUI_CONTAINER_HEALTH_TIMEOUT) with progressive logging.page_load_timeout(120)andscript_timeout(60)on WebDriver so tests fail fast instead of hanging 5+ minutes.127.0.0.1:8080for local debugging.Selenium UI tests — command palette
ensure_vscode_ready()andvscode_run_command()for CI visibility.E2E download timeout
download: { timeout: 60_000 }to.vscode-test.mjs.pnpm patchto@vscode/test-cli@0.0.12to fix a bug whereinstallDependentExtensions()calleddownloadAndUnzipVSCode()with positional args, dropping the timeout option (filed upstream as download.timeout not passed to downloadAndUnzipVSCode in installDependentExtensions microsoft/vscode-test-cli#106).related: #2580