-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Redesign Jenkinsfiles #12000
Redesign Jenkinsfiles #12000
Changes from 6 commits
52a2694
19f25f1
aff116a
6c39cd8
34d288d
b2bea2a
33d838f
fc4c2e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,34 +22,18 @@ | |
|
||
// timeout in minutes | ||
total_timeout = 300 | ||
git_timeout = 15 | ||
// assign any caught errors here | ||
err = null | ||
|
||
// initialize source codes | ||
def init_git() { | ||
deleteDir() | ||
retry(5) { | ||
try { | ||
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of | ||
// retries as this will increase the amount of requests and worsen the throttling | ||
timeout(time: git_timeout, unit: 'MINUTES') { | ||
checkout scm | ||
sh 'git submodule update --init --recursive' | ||
sh 'git clean -x -d -f' | ||
} | ||
} catch (exc) { | ||
deleteDir() | ||
error "Failed to fetch source codes with ${exc}" | ||
sleep 2 | ||
} | ||
} | ||
node('restricted-mxnetlinux-cpu') { | ||
// Loading the utilities requires a node context unfortunately | ||
checkout scm | ||
utils = load('ci/Jenkinsfile_utils.groovy') | ||
} | ||
utils.assign_node_labels(linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu') | ||
|
||
|
||
try { | ||
utils.main_wrapper( | ||
handler: { | ||
stage("Docker cache build & publish") { | ||
node('restricted-mxnetlinux-cpu') { | ||
node(NODE_LINUX_CPU) { | ||
ws('workspace/docker_cache') { | ||
timeout(time: total_timeout, unit: 'MINUTES') { | ||
init_git() | ||
|
@@ -58,24 +42,11 @@ try { | |
} | ||
} | ||
} | ||
|
||
// set build status to success at the end | ||
currentBuild.result = "SUCCESS" | ||
} catch (caughtError) { | ||
node("restricted-mxnetlinux-cpu") { | ||
sh "echo caught ${caughtError}" | ||
err = caughtError | ||
currentBuild.result = "FAILURE" | ||
} | ||
} finally { | ||
node("restricted-mxnetlinux-cpu") { | ||
// Only send email if master failed | ||
if (currentBuild.result == "FAILURE") { | ||
emailext body: 'Generating the Docker Cache has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[DOCKER CACHE FAILED] Run ${BUILD_NUMBER}', to: '${EMAIL}' | ||
} | ||
// Remember to rethrow so the build is marked as failing | ||
if (err) { | ||
throw err | ||
} | ||
, | ||
failure_handler: | ||
{ | ||
if (currentBuild.result == "FAILURE") { | ||
emailext body: 'Generating the Docker Cache has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[DOCKER CACHE FAILED] Run ${BUILD_NUMBER}', to: '${EMAIL}' | ||
} | ||
} | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. newline |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
// -*- mode: groovy -*- | ||
|
||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
// initialize source codes | ||
def init_git() { | ||
deleteDir() | ||
retry(5) { | ||
try { | ||
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of | ||
// retries as this will increase the amount of requests and worsen the throttling | ||
timeout(time: 15, unit: 'MINUTES') { | ||
checkout scm | ||
sh 'git submodule update --init --recursive' | ||
sh 'git clean -d -f' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was -x -d -f before, but it would be better it would be -xdff There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
} | ||
} catch (exc) { | ||
deleteDir() | ||
error "Failed to fetch source codes with ${exc}" | ||
sleep 2 | ||
} | ||
} | ||
} | ||
|
||
def init_git_win() { | ||
deleteDir() | ||
retry(5) { | ||
try { | ||
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of | ||
// retries as this will increase the amount of requests and worsen the throttling | ||
timeout(time: 15, unit: 'MINUTES') { | ||
checkout scm | ||
bat 'git submodule update --init --recursive' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this check if there's an error? I found that bat doesn't. powershell does to some degree. Why don't we use PS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know. The scope of this PR is restructuring the Jenkinsfiles and not a refactor of what we do in there. |
||
bat 'git clean -d -f' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
} | ||
} catch (exc) { | ||
deleteDir() | ||
error "Failed to fetch source codes with ${exc}" | ||
sleep 2 | ||
} | ||
} | ||
} | ||
|
||
// pack libraries for later use | ||
def pack_lib(name, libs=mx_lib) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. from where is the default value? this was confusing, but not it's more, every file defines its own defaults? I would rather make it explicit for every case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch! I'll make it explicit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
sh """ | ||
echo "Packing ${libs} into ${name}" | ||
echo ${libs} | sed -e 's/,/ /g' | xargs md5sum | ||
""" | ||
stash includes: libs, name: name | ||
} | ||
|
||
// unpack libraries saved before | ||
def unpack_lib(name, libs=mx_lib) { | ||
unstash name | ||
sh """ | ||
echo "Unpacked ${libs} from ${name}" | ||
echo ${libs} | sed -e 's/,/ /g' | xargs md5sum | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did we ever had problems that the files were not what we expected? Not sure who introduced this md5sum step. It adds complexity to stash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't recall having any issues with that. But let's move that to a later discussion |
||
""" | ||
} | ||
|
||
def publish_test_coverage() { | ||
// Fall back to our own copy of the bash helper if it failed to download the public version | ||
sh '(curl --retry 10 -s https://codecov.io/bash | bash -s -) || (curl --retry 10 -s https://s3-us-west-2.amazonaws.com/mxnet-ci-prod-slave-data/codecov-bash.txt | bash -s -)' | ||
} | ||
|
||
def collect_test_results_unix(original_file_name, new_file_name) { | ||
if (fileExists(original_file_name)) { | ||
// Rename file to make it distinguishable. Unfortunately, it's not possible to get STAGE_NAME in a parallel stage | ||
// Thus, we have to pick a name manually and rename the files so that they can be stored separately. | ||
sh 'cp ' + original_file_name + ' ' + new_file_name | ||
archiveArtifacts artifacts: new_file_name | ||
} | ||
} | ||
|
||
def collect_test_results_windows(original_file_name, new_file_name) { | ||
// Rename file to make it distinguishable. Unfortunately, it's not possible to get STAGE_NAME in a parallel stage | ||
// Thus, we have to pick a name manually and rename the files so that they can be stored separately. | ||
if (fileExists(original_file_name)) { | ||
bat 'xcopy ' + original_file_name + ' ' + new_file_name + '*' | ||
archiveArtifacts artifacts: new_file_name | ||
} | ||
} | ||
|
||
|
||
def docker_run(platform, function_name, use_nvidia, shared_mem = '500m') { | ||
def command = "ci/build.py --docker-registry ${env.DOCKER_CACHE_REGISTRY} %USE_NVIDIA% --platform %PLATFORM% --docker-build-retries 3 --shm-size %SHARED_MEM% /work/runtime_functions.sh %FUNCTION_NAME%" | ||
command = command.replaceAll('%USE_NVIDIA%', use_nvidia ? '--nvidiadocker' : '') | ||
command = command.replaceAll('%PLATFORM%', platform) | ||
command = command.replaceAll('%FUNCTION_NAME%', function_name) | ||
command = command.replaceAll('%SHARED_MEM%', shared_mem) | ||
|
||
sh command | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whitespace |
||
|
||
|
||
def assign_node_labels(args) { | ||
NODE_LINUX_CPU = args.linux_cpu | ||
NODE_LINUX_GPU = args.linux_gpu | ||
NODE_LINUX_GPU_P3 = args.linux_gpu_p3 | ||
NODE_WINDOWS_CPU = args.windows_cpu | ||
NODE_WINDOWS_GPU = args.windows_gpu | ||
} | ||
|
||
// assign any caught errors here | ||
err = null | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is the variable outside the function? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Copy & paste -> moved it inside |
||
def main_wrapper(args) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this main_wrapper? Can we add a documentation string? is not clear without greping around. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Documentation added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👀🤔 |
||
// hander: Core logic | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spelling. Plus if we're going to have two handlers I'd give them both descriptive names. i.e. failure_handler, build_handler. I'm also not sure about the name handler, since we're only handling errors in one case and not responding to explicit events. Maybe build and handle_error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, renamed |
||
// failure_handler: Failure handler | ||
|
||
try { | ||
// Call actual handler | ||
args['handler']() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could make the strings constants in this function (as you have in the other Jenkinsfile) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since these strings are only used once, does it make sense to make them constants? It would then be along the lines of
Given that situation, do you think it adds value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. People often do, but I particularly don't find it super valuable. I think it's a good idea to extract constants when it's used twice, let's do that. |
||
|
||
// set build status to success at the end | ||
currentBuild.result = "SUCCESS" | ||
} catch (caughtError) { | ||
node(NODE_LINUX_CPU) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we clarify this? It looks strange, why call bash to do echo? We can print from groovy right? Why match this particular node? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question |
||
sh "echo caught ${caughtError}" | ||
err = caughtError | ||
currentBuild.result = "FAILURE" | ||
} | ||
} finally { | ||
node(NODE_LINUX_CPU) { | ||
// Call failure handler | ||
args['failure_handler']() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure why this is in a finally. Why not call this from the catch? Then we wouldn't need the check in each failure_handler to see if it's result failed. We also wouldn't have to store the err and then re-throw it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. In the past, we had nested functions which were expected to continue although they had a failure and then exit at a later point in time. I don't think this applies anymore. Could we save this up for a refactor? I'm trying not to make any behavioural modifications here. |
||
|
||
// Remember to rethrow so the build is marked as failing | ||
if (err) { | ||
throw err | ||
} | ||
} | ||
} | ||
} | ||
return this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. newline at end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,32 +22,18 @@ | |
|
||
// timeout in minutes | ||
max_time = 60 | ||
// assign any caught errors here | ||
err = null | ||
|
||
// initialize source code | ||
def init_git() { | ||
deleteDir() | ||
retry(5) { | ||
try { | ||
// Make sure wait long enough for api.github.com request quota. Important: Don't increase the amount of | ||
// retries as this will increase the amount of requests and worsen the throttling | ||
timeout(time: 15, unit: 'MINUTES') { | ||
checkout scm | ||
sh 'git submodule update --init --recursive' | ||
sh 'git clean -d -f' | ||
} | ||
} catch (exc) { | ||
deleteDir() | ||
error "Failed to fetch source codes with ${exc}" | ||
sleep 2 | ||
} | ||
} | ||
node('restricted-mxnetlinux-cpu') { | ||
// Loading the utilities requires a node context unfortunately | ||
checkout scm | ||
utils = load('ci/Jenkinsfile_utils.groovy') | ||
} | ||
utils.assign_node_labels(linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu') | ||
|
||
try { | ||
utils.main_wrapper( | ||
handler: { | ||
stage('Build Docs') { | ||
node('restricted-mxnetlinux-cpu') { | ||
node(NODE_LINUX_CPU) { | ||
ws('workspace/docs') { | ||
init_git() | ||
timeout(time: max_time, unit: 'MINUTES') { | ||
|
@@ -58,24 +44,10 @@ try { | |
} | ||
} | ||
} | ||
|
||
// set build status to success at the end | ||
currentBuild.result = "SUCCESS" | ||
} catch (caughtError) { | ||
node("restricted-mxnetlinux-cpu") { | ||
sh "echo caught ${caughtError}" | ||
err = caughtError | ||
currentBuild.result = "FAILURE" | ||
} | ||
} finally { | ||
node("restricted-mxnetlinux-cpu") { | ||
// Only send email if master failed | ||
if (currentBuild.result == "FAILURE") { | ||
emailext body: 'Generating the website has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[WEBSITE FAILED] Build ${BUILD_NUMBER}', to: '${EMAIL}' | ||
} | ||
// Remember to rethrow so the build is marked as failing | ||
if (err) { | ||
throw err | ||
} | ||
, | ||
failure_handler: { | ||
if (currentBuild.result == "FAILURE") { | ||
emailext body: 'Generating the website has failed. Please view the build at ${BUILD_URL}', replyTo: '${EMAIL}', subject: '[WEBSITE FAILED] Build ${BUILD_NUMBER}', to: '${EMAIL}' | ||
} | ||
} | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. newline There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is inconsistent with the constant labels style from the other files, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I missed replacing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's explained below:
We can not use the constants at that point because the utility to load a constant has not been loaded at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a node to define some constants? Could we have a separate file for them and load it before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes. It's not possible to load other groovy scripts without being in a node context