Skip to content

Conversation

@dmitry-tokarev-nv
Copy link
Contributor

@dmitry-tokarev-nv dmitry-tokarev-nv commented Sep 26, 2025

Overview:

add new blocking checks for dynamo deploy testing

closes: OPS-1247, DEP-439, OPS-1158, OPS-1278, OPS-959

Summary by CodeRabbit

  • Chores
    • Updated CI/CD infrastructure to expand backend testing coverage across multiple deployment frameworks.

@dmitry-tokarev-nv dmitry-tokarev-nv requested a review from a team as a code owner September 26, 2025 04:33
@dmitry-tokarev-nv dmitry-tokarev-nv changed the title e2e tests feature: e2e tests Sep 26, 2025
@dmitry-tokarev-nv dmitry-tokarev-nv marked this pull request as draft September 26, 2025 04:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 26, 2025

Walkthrough

Introduces a new composite GitHub Action to tag and push Docker images with conditional AWS ECR/Azure ACR flows, and updates a workflow to push images and run Kubernetes-based end-to-end tests across a framework matrix, including cluster setup, Helm deployment of the platform, and cleanup.

Changes

Cohort / File(s) Summary
Composite action: Docker tag & push
.github/actions/docker-tag-push/action.yml
Adds a composite action with inputs (local_image, push_tag, AWS/ECR and Azure/ACR flags/creds), sets up Docker Buildx, conditionally logs into ECR/ACR, tags and pushes images, and exposes image_tag output.
Workflow: Image push + E2E tests
.github/workflows/container-validation-backends.yml
Adds a docker-tag-push step for trtllm and a new e2e-tests job using a matrix (vllm, trtllm, sglang). Sets up tools, kube context, namespace, secrets, deploys via Helm, waits for readiness, and performs cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant WF as GitHub Workflow
  participant ACT as docker-tag-push (Composite Action)
  participant ECR as AWS ECR
  participant ACR as Azure ACR
  participant DK as Docker Engine

  WF->>ACT: Invoke with inputs (local_image, push_tag, flags/creds)
  ACT->>DK: Setup Buildx
  alt AWS push enabled
    ACT->>ECR: Login (aws ecr get-login-password)
    ACT->>DK: docker tag local -> ECR repo:tag
    ACT->>ECR: docker push repo:tag
  end
  alt Azure push enabled
    ACT->>ACR: Login
    ACT->>DK: docker tag local -> ACR repo:tag
    ACT->>ACR: docker push repo:tag
  end
  ACT-->>WF: Output image_tag
Loading
sequenceDiagram
  autonumber
  participant CI as container-validation-backends.yml
  participant GH as GitHub Runner
  participant K8S as Kubernetes Cluster
  participant H as Helm
  participant REG as Registry (ECR/ACR)

  CI->>GH: Prepare environment (tools, creds)
  CI->>GH: Setup kubeconfig, create namespace
  CI->>REG: Push image via docker-tag-push
  CI->>H: helm repo add/update
  CI->>H: helm install platform (operator images)
  H->>K8S: Create resources (Deployments/CRDs)
  CI->>K8S: Wait for readiness
  note over CI,K8S: Run e2e across matrix: vllm, trtllm, sglang
  CI->>K8S: List resources, delete CRs
  CI->>H: helm uninstall
  CI->>K8S: Delete namespace
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paws and spin with glee,
New tags set sail across the sea,
E2E winds in Kubernetes blow,
Helm charts rise, deployments grow—
When tests are done, I tidy the glade,
Hop-hop! Clean tracks where pushes were made. 🐇🚀

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description provides an Overview and lists related issue codes, but it does not follow the repository’s required template sections: the Details and Where should the reviewer start sections are left empty, and the Related Issues entries are not formatted according to the template’s expectations. Please complete the repository’s PR description template by filling in the Details section with a summary of the changes, specifying which files reviewers should focus on, and formatting each Related Issue on its own line using the appropriate “closes/fixes/resolves” keyword and issue reference.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The PR title "feat: add e2e dynamo deploy tests" directly aligns with the primary changes in the changeset. The raw summary indicates that the main addition is a comprehensive e2e-tests job in the container-validation-backends.yml file, which includes multi-framework end-to-end testing with Kubernetes deployment, provisioning, and cleanup logic. While the changeset also includes a supporting docker-tag-push action, the title appropriately focuses on the primary and most significant change. The title is concise, clear, and specific enough that a teammate scanning the history would immediately understand that this PR adds end-to-end deployment tests for the dynamo platform.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (5)
.github/actions/docker-tag-push/action.yml (1)

6-16: Input naming/description clarity.

  • push_tag description mentions ECR but it’s used for both ECR/ACR; reword.
  • aks_push actually controls ACR push; consider rename to acr_push for clarity (optional).
   push_tag:
-    description: 'Target Tag for ECR'
+    description: 'Target repository:tag (used for ECR/ACR)'
     required: false
@@
-  aks_push:
-    description: 'Push to AKS Boolean'
+  aks_push:
+    description: 'Push to Azure ACR Boolean'
.github/workflows/container-validation-backends.yml (4)

127-135: Unknown runner label.

actionlint flags cpu-amd-m5-2xlarge. If this is a self-hosted runner, include self-hosted and your custom label, e.g.:

-    runs-on: cpu-amd-m5-2xlarge
+    runs-on: [self-hosted, linux, x64, cpu-amd-m5-2xlarge]

126-133: Gate e2e job on code changes (consistency with matrix jobs).

Align with changed-files gating to avoid unnecessary runs.

-  e2e-tests:
-    runs-on: cpu-amd-m5-2xlarge
+  e2e-tests:
+    needs: changed-files
+    if: needs.changed-files.outputs.has_code_changes == 'true'
+    runs-on: cpu-amd-m5-2xlarge

228-233: YAML lint: excessive blank lines.

Reduce blank lines per yamllint warning.


239-253: Cleanup sleep of 300s slows CI.

If not strictly required, consider reducing or guarding behind a debug flag.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bb7490 and 3a695da.

📒 Files selected for processing (2)
  • .github/actions/docker-tag-push/action.yml (1 hunks)
  • .github/workflows/container-validation-backends.yml (1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/container-validation-backends.yml

127-127: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


211-211: undefined variable "secret". available variables are "env", "github", "inputs", "job", "matrix", "needs", "runner", "secrets", "steps", "strategy", "vars"

(expression)

🪛 YAMLlint (1.37.1)
.github/workflows/container-validation-backends.yml

[warning] 228-228: too many blank lines (4 > 2)

(empty-lines)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3243/merge) by dmitry-tokarev-nv.
.github/workflows/container-validation-backends.yml

[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace). Files were modified by this hook.

.github/actions/docker-tag-push/action.yml

[error] 1-1: Trailing whitespace check failed. The hook modified this file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
@nv-anants nv-anants changed the title feat: e2e tests feat: add e2e dynamo deploy tests Oct 16, 2025
@nv-anants
Copy link
Contributor

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a695da and a36d11a.

📒 Files selected for processing (1)
  • .github/workflows/container-validation-backends.yml (2 hunks)
🧰 Additional context used
🪛 actionlint (1.7.8)
.github/workflows/container-validation-backends.yml

299-299: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


518-518: "steps" section is missing in job "deploy-test-sglang"

(syntax-check)


519-519: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


536-536: "steps" section must be sequence node but got alias node with "" tag

(syntax-check)


538-538: "steps" section is missing in job "deploy-test-trtllm"

(syntax-check)


539-539: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


558-558: "steps" section must be sequence node but got alias node with "" tag

(syntax-check)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: sglang
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: Build and Test - dynamo

@alec-flowers
Copy link
Contributor

Can we start with not-required and then enable it after a day or two of data to understand any problems?

Signed-off-by: Anant Sharma <[email protected]>
@nv-anants
Copy link
Contributor

Can we start with not-required and then enable it after a day or two of data to understand any problems?

updated to non blocking

@nv-anants nv-anants merged commit 5b9ae14 into main Oct 17, 2025
46 of 53 checks passed
@nv-anants nv-anants deleted the dtokarev-e2e branch October 17, 2025 20:21
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
Co-authored-by: mohammedabdulwahhab <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
nv-kmcgill53 pushed a commit that referenced this pull request Oct 23, 2025
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: mohammedabdulwahhab <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
Co-authored-by: mohammedabdulwahhab <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants