Skip to content

Commit

Permalink
feat: isolate libreoffice instances
Browse files Browse the repository at this point in the history
After creating a load test, I learned that if the libreoffice main broker locks
up, all subsequent unoconv will fail.
  • Loading branch information
Jean-Louis Fuchs committed Jul 12, 2022
1 parent 8ec932e commit 9e2db65
Show file tree
Hide file tree
Showing 16 changed files with 159 additions and 67 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ jobs:
docker-compose up -d --build
- name: Lint the code
run: |
docker-compose exec -T document-merge-service ./manage.py makemigrations --check --dry-run --no-input
docker-compose exec -T document-merge-service poetry run python ./manage.py makemigrations --check --dry-run --no-input
- name: Run pytest
run: docker-compose exec -T document-merge-service pytest --no-cov-on-fail --cov --create-db -vv
run: docker-compose exec -T document-merge-service poetry run pytest --no-cov-on-fail --cov --create-db -vv
black:
runs-on: ubuntu-latest
steps:
Expand Down
3 changes: 2 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ you can use common python tooling for formatting, linting, testing
etc.

```bash
poetry shell
# linting
flake8
# format code
Expand Down Expand Up @@ -65,8 +66,8 @@ Pre commit hooks is an additional option instead of executing checks in your edi
First create a virtualenv with the tool of your choice before running below commands:

```bash
poetry shell
pip install pre-commit
pip install -r requirements-dev.txt -U
pre-commit install --hook=pre-commit
pre-commit install --hook=commit-msg
```
Expand Down
31 changes: 12 additions & 19 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,38 +5,31 @@ WORKDIR /app
ARG UID=901

RUN wget -q https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -P /usr/local/bin \
&& chmod +x /usr/local/bin/wait-for-it.sh \
&& mkdir -p /app /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static \
&& useradd -u $UID -r document-merge-service --create-home \
&& mkdir /home/document-merge-service/.config \
&& chmod -R 770 /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static /home/document-merge-service \
&& apt-get update && apt-get install -y --no-install-recommends unoconv libreoffice-writer && rm -rf /var/lib/apt/lists/* \
# All project specific folders need to be accessible by newly created user but also for unknown users (when UID is set manually). Such users are in group root.
&& chown -R document-merge-service:root /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static /home/document-merge-service
&& chmod +x /usr/local/bin/wait-for-it.sh \
&& mkdir -p /app /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static \
&& useradd -u $UID -r document-merge-service --create-home \
&& mkdir /home/document-merge-service/.config \
&& chmod -R 770 /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static /home/document-merge-service \
&& apt-get update && apt-get install -y --no-install-recommends util-linux unoconv libreoffice-writer && rm -rf /var/lib/apt/lists/* \
# All project specific folders need to be accessible by newly created user but also for unknown users (when UID is set manually). Such users are in group root.
&& chown -R document-merge-service:root /var/lib/document-merge-service/data /var/lib/document-merge-service/media /var/www/static /home/document-merge-service
RUN pip install poetry

# Needs to be set for users with manually set UID
ENV HOME=/home/document-merge-service
USER document-merge-service

ENV PYTHONUNBUFFERED=1
ENV DJANGO_SETTINGS_MODULE document_merge_service.settings
ENV APP_HOME=/app
ENV UWSGI_INI /app/uwsgi.ini
ENV STATIC_ROOT /var/www/static
ENV MEDIA_ROOT /var/lib/document-merge-service/media

ARG ENV=docker
COPY pyproject.toml poetry.lock $APP_HOME/
RUN pip install poetry
RUN poetry config virtualenvs.create false \
&& poetry install $([ "$ENV" = "dev" ] || echo "--no-dev") --no-interaction --no-ansi
RUN poetry install $([ "$ENV" = "dev" ] || echo "--no-dev") --no-interaction --no-ansi
COPY . $APP_HOME

RUN ENV=docker ./manage.py collectstatic --noinput \
&& chown -R document-merge-service:root /var/www/static \
&& chmod 770 -R /var/www/static \
&& cp /bin/sleep /bin/dms_test_sleep

USER document-merge-service
EXPOSE 8000

CMD /bin/sh -c "./manage.py migrate && uwsgi"
CMD /bin/sh -c "poetry run python ./manage.py migrate && poetry run python uwsgi"
7 changes: 4 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,14 @@ start: ## Start the development server
@docker-compose up -d --build

test: ## Test the project
@docker-compose exec document-merge-service sh -c "black --check . && flake8 && mypy document_merge_service && pytest --no-cov-on-fail --cov --create-db"
#docker-compose exec document-merge-service poetry run sh -c "black --check . && flake8 && mypy document_merge_service && pytest --no-cov-on-fail --cov --create-db"
docker-compose exec document-merge-service poetry run sh -c "pytest --no-cov-on-fail --cov --create-db --pdb"

shell: ## Shell into document merge service
@docker-compose exec document-merge-service bash
@docker-compose exec document-merge-service poetry shell

format: ## Format python code with black
@docker-compose exec document-merge-service black .
@docker-compose exec document-merge-service poetry run black .

dmypy: ## Run mypy locally (starts a deamon for performance)
dmypy run document_merge_service
2 changes: 1 addition & 1 deletion docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ services:
[
"/bin/sh",
"-c",
"./manage.py migrate && ./manage.py runserver 0.0.0.0:8000",
"poetry run python ./manage.py migrate && poetry run python ./manage.py runserver 0.0.0.0:8000"
]
environment:
- ENV=dev
5 changes: 5 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
version: "3.4"
services:
document-merge-service:
privileged: true
# CAP_SYS_ADMIN wound be enabled for the container to unshare,
# but not on github CI
# cap_add:
# - CAP_SYS_ADMIN
image: adfinissygroup/document-merge-service
ports:
- "8000:8000"
Expand Down
Binary file added document_merge_service/api/data/loadtest/1.doc
Binary file not shown.
Binary file added document_merge_service/api/data/loadtest/2.docx
Binary file not shown.
Binary file added document_merge_service/api/data/loadtest/3.docx
Binary file not shown.
Binary file added document_merge_service/api/data/loadtest/4.docx
Binary file not shown.
71 changes: 60 additions & 11 deletions document_merge_service/api/tests/test_unoconv.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
import os
import random
import shutil
import sys
from multiprocessing.pool import ThreadPool
from pathlib import Path
from subprocess import TimeoutExpired, run
from time import sleep

import pytest
from psutil import process_iter

from ..unoconv import run_fork_safe
from ..unoconv import Unoconv, run_fork_safe


def test_timeout():
Expand All @@ -23,24 +28,68 @@ def kill_zombies(): # pragma: no cover
x.wait()


def kill_dms_sleep(bin): # pragma: no cover
def kill_dms_sleep(dms_test_bin): # pragma: no cover
found = False
kill_zombies()
for x in process_iter(["name"]):
if bin == x.name():
if dms_test_bin.name == x.name():
found = True
x.kill()
x.wait()
return found


@pytest.mark.skip
def test_fork(): # pragma: no cover
bin = "dms_test_sleep"
kill_dms_sleep(bin)
shell_cmd = f"{bin} infinity & disown"
def test_fork(dms_test_bin): # pragma: no cover
kill_dms_sleep(dms_test_bin)
shell_cmd = f"{dms_test_bin} infinity & disown"
run(["/bin/bash", "-c", shell_cmd])
sleep(0.2)
assert kill_dms_sleep(bin)
sleep(0.5)
assert kill_dms_sleep(dms_test_bin)
run_fork_safe(["/bin/bash", "-c", shell_cmd])
assert not kill_dms_sleep(bin)
sleep(0.5)
assert not kill_dms_sleep(dms_test_bin)


def run_fork_load(test_file):
unoconv = Unoconv("/usr/bin/python3", shutil.which("unoconv"))
return unoconv.process(test_file, "pdf")


def try_fork_load(arg):
n, test_file = arg
if n < 10:
# slowly start load test
sleep(0.05 * (10 - n))
try:
result = run_fork_load(test_file)
return result
except Exception as e: # pragma: no cover
return e


def test_fork_load(capsys):
count = 8
base = Path(__file__).parent.parent.absolute()
data = Path(base, "data")
load = Path(data, "loadtest")
test_files = []
test_files += [Path(data, "docx-mailmerge.docx")] * count
test_files += [Path(load, "1.doc")] * count
test_files += [Path(load, "2.docx")] * count
test_files += [Path(load, "3.docx")] * count
test_files += [Path(load, "4.docx")] * count
random.shuffle(test_files)
pool = ThreadPool(8)
with capsys.disabled():
sys.stdout.write(" Loadtest: ")
sys.stdout.flush()
for result in pool.imap(try_fork_load, enumerate(test_files)):
with capsys.disabled():
sys.stdout.write(".")
sys.stdout.flush()
if isinstance(result, Exception): # pragma: no cover
raise result
elif not result.stdout.startswith(b"%PDF"): # pragma: no cover
raise ValueError(result)
with capsys.disabled():
sys.stdout.write("done")
59 changes: 50 additions & 9 deletions document_merge_service/api/unoconv.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
from collections import namedtuple
from mimetypes import guess_type
from subprocess import PIPE, CalledProcessError, CompletedProcess, Popen, TimeoutExpired
from uuid import uuid4

from django.core.exceptions import ImproperlyConfigured

UnoconvResult = namedtuple(
"UnoconvResult", ["stdout", "stderr", "returncode", "content_type"]
Expand All @@ -12,9 +15,10 @@
# in testing 2 seconds is enough
_min_timeout = 2

# terminate_then_kill() takes 1 second in the worst case, so we have to use two seconds,
# terminate_then_kill() takes 1 second in the worst case, so we have to use three seconds,
# or the timeout won't be triggered before harakiri.
_ahead_of_harakiri = 2
# Increased to 3 seconds to be safe, we have reports of orphan soffice.bin in production.
_ahead_of_harakiri = 3


def get_default_timeout():
Expand Down Expand Up @@ -106,13 +110,38 @@ def run_fork_safe(
return CompletedProcess(process.args, retcode, stdout, stderr)


def run(cmd):
return run_fork_safe(
[str(arg) for arg in cmd],
def run(cmd, unshare=True):
# Run libreoffice in isolation. If the main broker of libreoffice locks up,
# all following calls to unoconv will hang as well. By unsharing the mount namespace,
# we get a copy of all the mounts, but we can change them without affecting the
# original. So unoconv will never connect to a main broker, and allways start a
# new instance. Then run_fork_safe will terminate the process and all the children
# of unoconv.
#
# I think masking /tmp is enough, but we can use this technique for other paths if
# needed.
if unshare:
shell = [
"unshare",
"--map-root-user",
"--ipc",
"--mount",
"sh",
]
cmd = f"""
mount -t tmpfs tmpfs /tmp
exec {cmd}
""".strip()
else: # pragma: no cover
shell = ["sh"]
ret = run_fork_safe(
shell,
stdout=PIPE,
stderr=PIPE,
timeout=_default_timeout,
input=cmd.encode("utf-8"),
)
return ret


class Unoconv:
Expand All @@ -123,10 +152,13 @@ def __init__(self, pythonpath, unoconvpath):
:param pythonpath: str() - path to the python interpreter
:param unoconvpath: str() - path to the unoconv binary
"""
self.cmd = [pythonpath, unoconvpath]
self.cmd = f"{pythonpath} {unoconvpath}"

def get_formats(self):
p = run(self.cmd + ["--show"])
from django.conf import settings

cmd = f"{self.cmd} --show"
p = run(cmd)
if not p.returncode == 0: # pragma: no cover
raise Exception("Failed to fetch the formats from unoconv!")

Expand All @@ -137,7 +169,14 @@ def get_formats(self):
if match:
formats.append(match.group("format"))

return set(formats)
formats = set(formats)
not_supported = set(settings.UNOCONV_ALLOWED_TYPES) - formats
if not_supported:
raise ImproperlyConfigured(
f"Unoconv doesn't support types {', '.join(not_supported)}."
)

return formats

def process(self, filename, convert):
"""
Expand All @@ -148,7 +187,9 @@ def process(self, filename, convert):
:return: UnoconvResult()
"""
# unoconv MUST be running with the same python version as libreoffice
p = run(self.cmd + ["--format", convert, "--stdout", filename])
pipe = str(uuid4())
cmd = f"{self.cmd} --timeout 10 --pipe {pipe} --format {convert} --stdout '{filename}'"
p = run(cmd)
stdout = p.stdout
if not p.returncode == 0: # pragma: no cover
stdout = f"unoconv returncode: {p.returncode}"
Expand Down
5 changes: 4 additions & 1 deletion document_merge_service/api/views.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import mimetypes
from pathlib import Path
from tempfile import NamedTemporaryFile

import jinja2
Expand Down Expand Up @@ -66,7 +67,9 @@ def merge(self, request, pk=None):
convert = serializer.data.get("convert")

if convert:
with NamedTemporaryFile("wb") as tmp:
dir = Path(settings.MEDIA_ROOT, "__convert__")
dir.mkdir(parents=True, exist_ok=True)
with NamedTemporaryFile("wb", dir=dir) as tmp:
tmp.write(response.content)
unoconv = Unoconv(
pythonpath=settings.UNOCONV_PYTHON,
Expand Down
13 changes: 13 additions & 0 deletions document_merge_service/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import importlib
import inspect
import shutil
from io import BytesIO
from pathlib import Path

import pytest
from django.core.cache import cache
Expand Down Expand Up @@ -89,3 +91,14 @@ def make_template(placeholder):
return template

return make_template


@pytest.fixture
def dms_test_bin():
sleep_path = Path(shutil.which("sleep"))
test_path = Path(Path(__file__).parent.absolute(), "tmpb5nw53v5")
with test_path.open("wb") as f, open(sleep_path, "rb") as g:
f.write(g.read())
test_path.chmod(0o755)
yield test_path
test_path.unlink()
17 changes: 0 additions & 17 deletions document_merge_service/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
import environ
from django.core.exceptions import ImproperlyConfigured

from .api.unoconv import Unoconv

env = environ.Env()
django_root = environ.Path(__file__) - 2

Expand Down Expand Up @@ -161,21 +159,6 @@ def parse_admins(admins):
UNOCONV_PATH = env.str("UNOCONV_PATH", default="/usr/bin/unoconv")


def get_unoconv_formats():
uno = Unoconv(pythonpath=UNOCONV_PYTHON, unoconvpath=UNOCONV_PATH)
formats = uno.get_formats()
not_supported = set(UNOCONV_ALLOWED_TYPES) - formats

if not_supported:
raise ImproperlyConfigured(
f"Unoconv doesn't support types {', '.join(not_supported)}."
)

return formats


UNOCONV_FORMATS = get_unoconv_formats()

# Jinja2
DOCXTEMPLATE_JINJA_EXTENSIONS = env.list(
"DOCXTEMPLATE_JINJA_EXTENSIONS", default=default([])
Expand Down
Loading

0 comments on commit 9e2db65

Please sign in to comment.