Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm mode modifications #73

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,27 @@ services:
- INCLUDE_LOGS=true
```

#### Running in Docker Swarm
To run the backup service in Docker Swarm, you should configure it as a global service. This ensures that the backup process runs on each node in the Swarm and backs up all matching containers on the local node.
```yml
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A lot of this YAML is duplicated from the above example. Perhaps it would be better to link out to the Docker documentation for making a service "global"?

version: "3.9" # Use version 3.9 to support Swarm deploy settings

services:
backup:
image: ghcr.io/realorangeone/db-auto-backup:latest
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./backups:/var/backups
environment:
- SUCCESS_HOOK_URL=https://hc-ping.com/1234
- INCLUDE_LOGS=true
deploy:
mode: global # Run one instance of the service on each node
restart_policy:
condition: on-failure # Restart on failure to ensure resilience
```

### Oneshot

You may want to use this container to run backups just once, rather than on a schedule. To achieve this, set `$SCHEDULE` to an empty string, and the backup will run just once. This may be useful in conjunction with an external scheduler.
172 changes: 128 additions & 44 deletions db-auto-backup.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,70 +160,154 @@ def get_backup_provider(container_names: Iterable[str]) -> Optional[BackupProvid


def get_container_names(container: Container) -> Iterable[str]:
"""
Extract names for a container from image tags or fallback to container name.
"""
names = set()
for tag in container.image.tags:
registry, image = docker.auth.resolve_repository_name(tag)
image, tag_name = image.split(":", 1)
names.add(image)

if container.image.tags:
for tag in container.image.tags:
image_name = tag.split(":")[0].split("@")[0]
image_name = image_name.split("/")[-1]
Comment on lines +170 to +171
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why are you removing resolve_repository_name here? Was it not working correctly?

names.add(image_name)

if not names and container.attrs.get("Config", {}).get("Image"):
image_name = container.attrs["Config"]["Image"].split(":")[0].split("@")[0]
image_name = image_name.split("/")[-1]
names.add(image_name)

if not names and container.name:
names.add(container.name)

return names

def is_swarm_mode() -> bool:
docker_client = docker.from_env()
info = docker_client.info()
return info.get("Swarm", {}).get("LocalNodeState") == "active"

@pycron.cron(SCHEDULE)
def backup(now: datetime) -> None:
print("Starting backup...")

def get_local_node_id() -> str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does the new logic need to happen only in swarm mode? Could checking for the container config image name be done all the time instead? It feels like it's a purely additive change

docker_client = docker.from_env()
containers = docker_client.containers.list()
info = docker_client.info()
return info["Swarm"]["NodeID"]

backed_up_containers = []

print(f"Found {len(containers)} containers.")
def get_local_node_tasks() -> list:
docker_client = docker.from_env()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This code creates a lot of extra connections to the docker API - could a client be passed around instead?

local_node_id = get_local_node_id()
services = docker_client.services.list()

for container in containers:
container_names = get_container_names(container)
backup_provider = get_backup_provider(container_names)
if backup_provider is None:
continue
local_tasks = []
for service in services:
tasks = service.tasks()
for task in tasks:
if task["NodeID"] == local_node_id and task["Status"]["State"] == "running":
local_tasks.append(task)

backup_file = (
BACKUP_DIR
/ f"{container.name}.{backup_provider.file_extension}{get_compressed_file_extension(COMPRESSION)}"
)
backup_temp_file_path = BACKUP_DIR / temp_backup_file_name()
return local_tasks

backup_command = backup_provider.backup_method(container)
_, output = container.exec_run(backup_command, stream=True, demux=True)

with open_file_compressed(
backup_temp_file_path, COMPRESSION
) as backup_temp_file:
with tqdm.wrapattr(
backup_temp_file,
method="write",
desc=container.name,
disable=not SHOW_PROGRESS,
) as f:
for stdout, _ in output:
if stdout is None:
continue
f.write(stdout)
def create_backup_file_name(container: Container, backup_provider: BackupProvider) -> Path:
"""
Create a backup file name with a timestamp prefix and the container name.
"""
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: This is a very big change. Backups are intentionally not timestamped. Please remove this.

container_name = container.name
return BACKUP_DIR / f"{timestamp}_{container_name}.{backup_provider.file_extension}{get_compressed_file_extension(COMPRESSION)}"

os.replace(backup_temp_file_path, backup_file)

if not SHOW_PROGRESS:
print(container.name)
@pycron.cron(SCHEDULE)
def backup(now: datetime) -> None:
print("Starting backup...")

backed_up_containers.append(container.name)
docker_client = docker.from_env()

duration = (datetime.now() - now).total_seconds()
print(
f"Backup of {len(backed_up_containers)} containers complete in {duration:.2f} seconds."
)
if is_swarm_mode():
print("Running in Swarm mode, adjusting container lookup...")
tasks = get_local_node_tasks()
backed_up_services = []

for task in tasks:
task_container_id = task['Status']['ContainerStatus']['ContainerID']
try:
container = docker_client.containers.get(task_container_id)
except docker.errors.NotFound:
continue

container_names = get_container_names(container)
backup_provider = get_backup_provider(container_names)

if backup_provider is None:
continue

backup_file = create_backup_file_name(container, backup_provider)
backup_temp_file_path = BACKUP_DIR / temp_backup_file_name()

backup_command = backup_provider.backup_method(container)
_, output = container.exec_run(backup_command, stream=True, demux=True)

with open_file_compressed(
backup_temp_file_path, COMPRESSION
) as backup_temp_file:
with tqdm.wrapattr(
backup_temp_file,
method="write",
desc=task["ServiceID"],
disable=not SHOW_PROGRESS,
) as f:
for stdout, _ in output:
if stdout is None:
continue
f.write(stdout)

os.replace(backup_temp_file_path, backup_file)
backed_up_services.append(container.name)

duration = (datetime.now() - now).total_seconds()
print(f"Backup of {len(backed_up_services)} services complete in {duration:.2f} seconds.")
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: What's the reason for the below duplication? Could the extra logic be handled in the existing single loop?

containers = docker_client.containers.list()
backed_up_containers = []

for container in containers:
container_names = get_container_names(container)
backup_provider = get_backup_provider(container_names)

if backup_provider is None:
continue

backup_file = create_backup_file_name(container, backup_provider)
backup_temp_file_path = BACKUP_DIR / temp_backup_file_name()

backup_command = backup_provider.backup_method(container)
_, output = container.exec_run(backup_command, stream=True, demux=True)

with open_file_compressed(
backup_temp_file_path, COMPRESSION
) as backup_temp_file:
with tqdm.wrapattr(
backup_temp_file,
method="write",
desc=container.name,
disable=not SHOW_PROGRESS,
) as f:
for stdout, _ in output:
if stdout is None:
continue
f.write(stdout)

os.replace(backup_temp_file_path, backup_file)
backed_up_containers.append(container.name)
duration = (datetime.now() - now).total_seconds()
print(
f"Backup of {len(backed_up_containers)} containers complete in {duration:.2f} seconds."
)

if success_hook_url := get_success_hook_url():
if INCLUDE_LOGS:
response = requests.post(
success_hook_url, data="\n".join(backed_up_containers)
success_hook_url, data="\n".join(backed_up_containers or backed_up_services)
)
else:
response = requests.get(success_hook_url)
Expand Down
Loading