-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging to main: Merge multiple Dockerfiles into a single one #2189
Conversation
Merge multiple Dockerfiles into a single one --------- Signed-off-by: Simon Zhao <[email protected]> Co-authored-by: Miguel Fierro <[email protected]>
I checked and the GPU tests that were failing now are working: https://github.com/recommenders-team/recommenders/actions/runs/11831613396 |
// https://github.com/devcontainers/features/blob/main/src/anaconda/devcontainer-feature.json | ||
"ghcr.io/devcontainers/features/anaconda:1": { | ||
"version": "2024.06-1" | ||
"build": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the GPU tests failed. When looking at the AzureML workspace I see this error: Provisioning error
The specified Virtual Machine size in low priority is currently out of capacity. Please retry later, try reducing the Virtual Machine size or number of instances, try using dedicated VMs to improve chances of capacity allocations, or try deploying to a different region.
I'll try to run the tests again, otherwise, I'll try to see if we need new VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the VM size is specified in tests/ci/azureml_tests/submit_groupwise_azureml_pytest.py. Maybe we need to use another one, because the code was written almost 3 years ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VMs are chosen from the portal. Finally it worked. I think it was a momentary error.
Merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SimonYansenZhao the process automatically cancells after 6h. I was trying to find how this limit is defined to reduce it, but I don't know where we are setting this value. Do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found this: https://stackoverflow.com/a/59076067, and https://docs.github.com/en/actions/writing-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes says the default timeout is 360mins.
Staging to main: Merge multiple Dockerfiles into a single one
Merge multiple Dockerfiles into a single one
Signed-off-by: Simon Zhao [email protected]
Co-authored-by: Miguel Fierro [email protected]
Description
Related Issues
References
Checklist:
git commit -s -m "your commit message"
.staging branch
AND NOT TOmain branch
.