-
Couldn't load subscription status.
- Fork 659
feat: add e2e dynamo deploy tests #3243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]>
WalkthroughIntroduces a new composite GitHub Action to tag and push Docker images with conditional AWS ECR/Azure ACR flows, and updates a workflow to push images and run Kubernetes-based end-to-end tests across a framework matrix, including cluster setup, Helm deployment of the platform, and cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant WF as GitHub Workflow
participant ACT as docker-tag-push (Composite Action)
participant ECR as AWS ECR
participant ACR as Azure ACR
participant DK as Docker Engine
WF->>ACT: Invoke with inputs (local_image, push_tag, flags/creds)
ACT->>DK: Setup Buildx
alt AWS push enabled
ACT->>ECR: Login (aws ecr get-login-password)
ACT->>DK: docker tag local -> ECR repo:tag
ACT->>ECR: docker push repo:tag
end
alt Azure push enabled
ACT->>ACR: Login
ACT->>DK: docker tag local -> ACR repo:tag
ACT->>ACR: docker push repo:tag
end
ACT-->>WF: Output image_tag
sequenceDiagram
autonumber
participant CI as container-validation-backends.yml
participant GH as GitHub Runner
participant K8S as Kubernetes Cluster
participant H as Helm
participant REG as Registry (ECR/ACR)
CI->>GH: Prepare environment (tools, creds)
CI->>GH: Setup kubeconfig, create namespace
CI->>REG: Push image via docker-tag-push
CI->>H: helm repo add/update
CI->>H: helm install platform (operator images)
H->>K8S: Create resources (Deployments/CRDs)
CI->>K8S: Wait for readiness
note over CI,K8S: Run e2e across matrix: vllm, trtllm, sglang
CI->>K8S: List resources, delete CRs
CI->>H: helm uninstall
CI->>K8S: Delete namespace
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (5)
.github/actions/docker-tag-push/action.yml (1)
6-16: Input naming/description clarity.
- push_tag description mentions ECR but it’s used for both ECR/ACR; reword.
- aks_push actually controls ACR push; consider rename to acr_push for clarity (optional).
push_tag: - description: 'Target Tag for ECR' + description: 'Target repository:tag (used for ECR/ACR)' required: false @@ - aks_push: - description: 'Push to AKS Boolean' + aks_push: + description: 'Push to Azure ACR Boolean'.github/workflows/container-validation-backends.yml (4)
127-135: Unknown runner label.actionlint flags cpu-amd-m5-2xlarge. If this is a self-hosted runner, include self-hosted and your custom label, e.g.:
- runs-on: cpu-amd-m5-2xlarge + runs-on: [self-hosted, linux, x64, cpu-amd-m5-2xlarge]
126-133: Gate e2e job on code changes (consistency with matrix jobs).Align with changed-files gating to avoid unnecessary runs.
- e2e-tests: - runs-on: cpu-amd-m5-2xlarge + e2e-tests: + needs: changed-files + if: needs.changed-files.outputs.has_code_changes == 'true' + runs-on: cpu-amd-m5-2xlarge
228-233: YAML lint: excessive blank lines.Reduce blank lines per yamllint warning.
239-253: Cleanup sleep of 300s slows CI.If not strictly required, consider reducing or guarding behind a debug flag.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/actions/docker-tag-push/action.yml(1 hunks).github/workflows/container-validation-backends.yml(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/container-validation-backends.yml
127-127: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
211-211: undefined variable "secret". available variables are "env", "github", "inputs", "job", "matrix", "needs", "runner", "secrets", "steps", "strategy", "vars"
(expression)
🪛 YAMLlint (1.37.1)
.github/workflows/container-validation-backends.yml
[warning] 228-228: too many blank lines (4 > 2)
(empty-lines)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3243/merge) by dmitry-tokarev-nv.
.github/workflows/container-validation-backends.yml
[error] 1-1: Trailing whitespace detected and fixed by pre-commit hook (trailing-whitespace). Files were modified by this hook.
.github/actions/docker-tag-push/action.yml
[error] 1-1: Trailing whitespace check failed. The hook modified this file.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/container-validation-backends.yml(2 hunks)
🧰 Additional context used
🪛 actionlint (1.7.8)
.github/workflows/container-validation-backends.yml
299-299: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
518-518: "steps" section is missing in job "deploy-test-sglang"
(syntax-check)
519-519: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
536-536: "steps" section must be sequence node but got alias node with "" tag
(syntax-check)
538-538: "steps" section is missing in job "deploy-test-trtllm"
(syntax-check)
539-539: label "cpu-amd-m5-2xlarge" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
558-558: "steps" section must be sequence node but got alias node with "" tag
(syntax-check)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: sglang
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: Build and Test - dynamo
|
Can we start with not-required and then enable it after a day or two of data to understand any problems? |
Signed-off-by: Anant Sharma <[email protected]>
updated to non blocking |
Signed-off-by: Dmitry Tokarev <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: mohammedabdulwahhab <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: mohammedabdulwahhab <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: mohammedabdulwahhab <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: mohammedabdulwahhab <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]>
Overview:
add new blocking checks for dynamo deploy testing
closes: OPS-1247, DEP-439, OPS-1158, OPS-1278, OPS-959
Summary by CodeRabbit