Skip to content

Commit 059a7ae

Browse files
[ci] Kill hanged docker build process to avoid build timeout issue. (sonic-net#13726)
Why I did it Docker build has a low rate of hanging up. It hangs on different steps. So, it looks like a bug in docker daemon. How I did it Start a daemon process to scan running time more than 1 hours, and kill the process. How to verify it
1 parent 5b64d82 commit 059a7ae

File tree

4 files changed

+34
-0
lines changed

4 files changed

+34
-0
lines changed

.azure-pipelines/azure-pipelines-build.yml

+1
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ jobs:
109109

110110
buildSteps:
111111
- template: template-skipvstest.yml
112+
- template: template-daemon.yml
112113
- bash: |
113114
set -ex
114115
if [ $(GROUP_NAME) == vs ]; then

.azure-pipelines/cleanup.yml

+7
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
steps:
22
- script: |
3+
set -x
4+
# kill daemon process
5+
ps $(cat /tmp/azp_daemon_kill_docker_pid)
6+
sudo kill $(cat /tmp/azp_daemon_kill_docker_pid)
7+
rm /tmp/azp_daemon_kill_docker_pid
8+
39
if sudo [ -f /var/run/march/docker.pid ] ; then
410
pid=`sudo cat /var/run/march/docker.pid` ; sudo kill $pid
511
fi
@@ -11,4 +17,5 @@ steps:
1117
pid=`sudo cat dockerfs/var/run/docker.pid` ; sudo kill $pid
1218
fi
1319
sudo rm -rf $(ls -A1)
20+
condition: always()
1421
displayName: "Clean Workspace"

.azure-pipelines/template-daemon.yml

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
steps:
2+
- bash: |
3+
(
4+
while true
5+
do
6+
sleep 120
7+
now=$(date +%s)
8+
pids=$(ps -C docker -o pid,etime,args | grep "docker build" | cut -d" " -f1)
9+
for pid in $pids
10+
do
11+
start=$(date --date="$(ls -dl /proc/$pid --time-style full-iso | awk '{print$6,$7}')" +%s)
12+
time_s=$(($now-$start))
13+
if [[ $time_s -gt $(DOCKER_BUILD_TIMEOUT) ]]; then
14+
echo =========== $(date +%F%T) $time_s &>> target/daemon.log
15+
ps $pid &>> target/daemon.log
16+
sudo kill $pid
17+
fi
18+
done
19+
done
20+
) &
21+
daemon_pid=$!
22+
ps $daemon_pid
23+
echo $daemon_pid >> /tmp/azp_daemon_kill_docker_pid
24+
displayName: start daemon to kill hang docker

.azure-pipelines/template-variables.yml

+2
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@ variables:
44
SONIC_SLAVE_DOCKER_DRIVER: 'overlay2'
55
SONIC_BUILD_RETRY_COUNT: 3
66
SONIC_BUILD_RETRY_INTERVAL: 600
7+
DOCKER_BUILDKIT: 0
8+
DOCKER_BUILD_TIMEOUT: 3600

0 commit comments

Comments
 (0)