-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Better utilization of CI resources and other CI improvements #5891
Comments
Thanks for surfacing and outlining the issues. Would a build be triggered by anyone that comments |
@terrytangyuan I think only committers should have a right to start tests.
|
The second option sounds better to me. Even though link checks are fast but if a contributor pushes many commits it could be a problem. |
Actually two options can go together. Lint checks can run in GitHub Actions and would not be subject to the EC2 quota. So we can let contributors run as many lint checks as they'd like. More substantial tests should require a committer's approval when they are run second time. |
Got it. Yea all checks on GitHub Actions can always run. |
I raised the daily limit to 50 USD, to allow more jobs to run each day. We don't want to slow down our development speed too much. I'm really hoping to reduce the cost of each test job, so that we can bring the daily rate back down. |
Summary of expenses this year, per OS:
Windows jobs cost whopping 64.5% of the total, and Linux cost only 28.0% of the total. |
@dmlc/xgboost-committer Good news: #5904 saves the cost of Windows test pipeline by up to 66%. |
This comment has been minimized.
This comment has been minimized.
I just came up with a more robust method: Change the permissions of the Jenkins manager node. Ordinarily, it is given the right to launch new EC2 instances via an IAM policy. To restrict provision of new instances, it suffices to attach the following policy JSON: {
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:RunInstances",
"ec2:StartInstances"
],
"Effect": "Deny",
"Resource": "*"
}
]
} This Deny policy will override the pre-existing permission and the manager would no longer be able to launch any new EC2 worker EC2 instances. (It can however terminate existing EC2 instances.) Update. The Cost Watcher Lambda function now also controls whether the Jenkins manager (master) can launch EC2 workers or not: hcho3/xgboost-devops@e7402fe |
#5904 has been very effective in reducing the cost of the Windows CI pipeline: |
Great work! |
There were two mistakes that slowed down GPU tests to ~ 40 minutes:
Now GPU test suite completes in 15 min. |
Completed in #8142. Now we require manual approval for running tests with pull requests |
Run-away cloud cost of our Jenkins CI server has been a pressing issue (#5176). It is hosted on AWS, which charges by the hour. #5884 finally created the mechanism to enforce a daily budget via throttling.
We now have a dashboard page to keep track of daily spending: https://xgboost-ci.net/dashboard/
Now it is time to extract savings and ensure that we are using limited CI resources on where it matters.
State of the CI: The free credits from AWS ran out this month, so we now have to start drawing from the Open Collective account, which currently has 10531.16 USD. If we limit ourselves to spending 33 USD per day, the balance will last 287 days.
run tests
). Right now, tests run automatically, and there are many cases where automatically starting tests is wasteful.Other CI improvements, outside of Jenkins
The text was updated successfully, but these errors were encountered: