Skip to content

Commit

Permalink
Fix issue with docker not restarting after instance stop/start (#19)
Browse files Browse the repository at this point in the history
* Fix bug in setup.py (not sure where this came from)

* Rework packer script for clarity and for bug identified as issue 15

Modify utility scripts to `--output text` instead of json
  • Loading branch information
rappdw authored and mikekwright committed Oct 10, 2018
1 parent ed5634b commit 689961d
Show file tree
Hide file tree
Showing 9 changed files with 107 additions and 38 deletions.
58 changes: 57 additions & 1 deletion issues/15 - Docker not running after stop/notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,60 @@ Failed to enable unit: File /etc/systemd/system/multi-user.target.wants/docker.s
Need to look at what exactly stop/start does

According to [AWS Docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html), the instance
performs a normal shutdown.
performs a normal shutdown.

Steps to Diagnose

1) create-dock -m test
2) ssh-dock test
3) verify docker is running
4) `date >timestamp.log`
5) `sudo journalctl -u docker.service >docker.log`
6) exit
7) stop-dock test
8) start-dock test
9) ssh-dock test
10) verify docker is not running
11) `date >timestamp.restart.log`
12) `sudo journalctl -u docker.service >docker.reboot.log`


# On Jeremy's system
```
root@ip-10-93-135-93:/home/ubuntu# systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: https://docs.docker.com
```
from journalctl -u docker
```
Oct 10 13:23:35 ip-10-93-135-93 systemd[1]: Started Docker Application Container Engine.
Oct 10 13:23:36 ip-10-93-135-93 dockerd[16334]: http: TLS handshake error from 10.92.8.39:52312: remote error: tls: bad certificate
Oct 10 13:23:36 ip-10-93-135-93 dockerd[16334]: http: TLS handshake error from 10.92.8.39:52313: remote error: tls: bad certificate
Oct 10 13:35:11 ip-10-93-135-93 systemd[1]: Stopping Docker Application Container Engine...
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.152596710Z" level=info msg="Processing signal 'terminated'"
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.154355239Z" level=info msg="stopping event stream following graceful shutdown" error="
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.154381095Z" level=info msg="stopping healthcheck following graceful shutdown" module=l
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.154438958Z" level=info msg="stopping event stream following graceful shutdown" error="
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.154853556Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4204532c0,
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.154876801Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4204532c0,
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.155033817Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42003a330,
Oct 10 13:35:11 ip-10-93-135-93 dockerd[16334]: time="2018-10-10T13:35:11.155051869Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42003a330,
Oct 10 13:35:12 ip-10-93-135-93 systemd[1]: Stopped Docker Application Container Engine.
```
Current time is: Wed Oct 10 13:45:33 UTC 2018

## Resolution

The problem was in the `packer/configure-docker-v1.sh` script which had the line:

```
sudo sed -i 's"dockerd\ -H\ fd://"dockerd"g' /etc/systemd/system/multi-user.target.wants/docker.service
```

`/etc/systemd/system/multi-user.target.wants/docker.service` is a symlink to `/lib/systemd/system/docker.service`.
Running sed against it, turned it into a file. This caused the service to not be recognized as enabled
by `systemd`.
6 changes: 6 additions & 0 deletions packer/configure-docker-v1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash
sudo usermod -aG docker ubuntu
sudo systemctl stop docker
sudo sed -i 's"dockerd\ -H\ fd://"dockerd"g' /lib/systemd/system/docker.service
sudo systemctl daemon-reload
sudo systemctl start docker
10 changes: 0 additions & 10 deletions packer/docker-setup-v1.sh

This file was deleted.

34 changes: 14 additions & 20 deletions packer/resero-labs-nvidia-docker.packer
Original file line number Diff line number Diff line change
Expand Up @@ -33,40 +33,34 @@
],
"post-processors": [],
"provisioners": [
{
"type": "file",
"source": "setup-v1.sh",
"destination": "/home/ubuntu/setup-v1.sh"
},
{
"type": "shell",
"inline": [
"sleep 30",
"sudo apt-get update",
"sudo apt-get install -y gcc make",
"wget -P /tmp http://us.download.nvidia.com/tesla/396.44/NVIDIA-Linux-x86_64-396.44.run",
"chmod +x /tmp/NVIDIA-Linux-x86_64-396.44.run",
"sudo /tmp/NVIDIA-Linux-x86_64-396.44.run -silent",
"sudo apt-get update",
"sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common",
"curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -",
"curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -",
"distribution=$(. /etc/os-release;echo $ID$VERSION_ID)",
"sudo add-apt-repository 'deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable'",
"curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list",
"sudo apt-get update",
"sudo apt-get install -y docker-ce=18.06.0~ce~3-0~ubuntu",
"sudo apt-get install -y nvidia-docker2"
"sudo /home/ubuntu/setup-v1.sh",
"rm /home/ubuntu/setup-v1.sh"
]
},
{
"type": "file",
"source": "docker-setup-v1.sh",
"destination": "~/docker-setup-v1.sh"
"source": "configure-docker-v1.sh",
"destination": "/home/ubuntu/configure-docker-v1.sh"
},
{
"type": "shell",
"inline": ["sudo ~/docker-setup-v1.sh"]
"inline": [
"sudo /home/ubuntu/configure-docker-v1.sh",
"rm /home/ubuntu/configure-docker-v1.sh"
]
},
{
"type": "file",
"source": "etc-rc.local",
"destination": "~/rc.local"
"destination": "/home/ubuntu/rc.local"
},
{
"type": "shell",
Expand Down
23 changes: 23 additions & 0 deletions packer/setup-v1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

# wait just a bit to allow everything to settle down
sleep 30

# update apt and install dependencies
sudo apt-get update
sudo apt-get install -y gcc make apt-transport-https ca-certificates curl software-properties-common

# get the latest nvidia drivers and install them
wget -P /tmp http://us.download.nvidia.com/tesla/396.44/NVIDIA-Linux-x86_64-396.44.run
chmod +x /tmp/NVIDIA-Linux-x86_64-396.44.run
sudo /tmp/NVIDIA-Linux-x86_64-396.44.run -silent

# now get docker and nvidia-docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo add-apt-repository 'deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable'
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y docker-ce=18.06.0~ce~3-0~ubuntu
sudo apt-get install -y nvidia-docker2
2 changes: 1 addition & 1 deletion scripts/destroy-dock
Original file line number Diff line number Diff line change
Expand Up @@ -76,5 +76,5 @@ if [ -n "$f" ]; then
fi

if [ -n "$INSTANCE_ID" ]; then
aws ec2 terminate-instances --instance-ids "${INSTANCE_ID}"
aws ec2 terminate-instances --instance-ids "${INSTANCE_ID}" --output text
fi
4 changes: 2 additions & 2 deletions scripts/start-dock
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ INSTANCE_ID=$(get_instance_id $DOCK_IP)

if [ -n "$INSTANCE_ID" ]; then
echo "Starting instance..."
aws ec2 start-instances --instance-ids "${INSTANCE_ID}"
aws ec2 start-instances --instance-ids "${INSTANCE_ID}" --output text
echo "Waiting for instance to start..."
aws ec2 wait system-status-ok --instance-ids $INSTANCE_ID
aws ec2 wait system-status-ok --instance-ids $INSTANCE_ID --output text
fi
2 changes: 1 addition & 1 deletion scripts/stop-dock
Original file line number Diff line number Diff line change
Expand Up @@ -77,5 +77,5 @@ if [ "$RESPONSE" != "y" ] && [ "$RESPONSE" != "h" ]; then
fi

if [ -n "$INSTANCE_ID" ]; then
aws ec2 stop-instances --instance-ids "${INSTANCE_ID}"
aws ec2 stop-instances --instance-ids "${INSTANCE_ID}" --output text
fi
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,16 @@
packages=find_packages(exclude=['tests*']),
license="MIT License",
python_requires='>=3.6',
classifiers=(
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers',
'Natural Language :: English',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
),
],
install_requires=[
'aws',
'awscli',
'boto3'
],
extras_require={
Expand Down

0 comments on commit 689961d

Please sign in to comment.