LOTemplate is document generator used to create documents programatically (ODT, DOCX, PDF) from a template and a json file.
flowchart LR
template["Template<br/>(DOCX or ODT)"]
json["Data<br/>(JSON)"]
lotemplate["LO Template<br/>(accessible by API or CLI)"]
generatedFile["Generated File<br/>(PDF, DOCX, ODT, RTF,...)"]
template --> lotemplate
json --> lotemplate
lotemplate --> generatedFile
What makes this tool different from others are the following features :
- The templates are in DOCX or ODT (Word or Libre Office) format
- Template can have complex structures (variables, loop, conditions, counters, html,...)
- The tool can scan the template to extract the variables sheet
- The tool can be called by an API, a CLI or a python module.
The tool is written in Python and use a real LibreOffice headless to fill the templates.
Create a docker-compose.yml
version: '3'
services:
lotemplate:
image: probesys38/lotemplate:v1.5.0
volumes:
- lotemplate-uploads:/app/uploads
environment:
- SECRET_KEY=lopassword
command: "gunicorn -w 4 -b 0.0.0.0:8000 app:app"
run the service
docker-compose up -d
# creation of a directory
curl -X PUT -H 'secret_key: lopassword' -H 'directory: test_dir1' http://localhost:8000/
# {"directory":"test_dir1","message":"Successfully created"}
Let's imagine we have an file basic_test.odt (created by libreoffice) like this :
Test document
let’s see if the tag $my_tag is replaced and this $other_tag is detected.
[if $my_tag == foo]My tag is foo[endif]
[if $my_tag != foo]My tag is not foo[endif]
Upload this file to lotemplate
# upload a template
curl -X PUT -H 'secret_key: lopassword' -F file=@/tmp/basic_test.odt http://localhost:8000/test_dir1
# {"file":"basic_test.odt","message":"Successfully uploaded","variables":{"my_tag":{"type":"text","value":""},"other_tag":{"type":"text","value":""}}}
# generate a file titi.odt from a template and a json content
curl -X POST \
-H 'secret_key: lopassword' \
-H 'Content-Type: application/json' \
-d '[{"name":"my_file.odt","variables":{"my_tag":{"type":"text","value":"foo"},"other_tag":{"type":"text","value":"bar"}}}]' \
--output titi.odt http://localhost:8000/test_dir1/basic_test.odt
After the operation, you get the file titi.odt with this content :
Test document
let’s see if the tag foo is replaced and this bar is detected.
My tag is foo
- Installation
- Basic Usage
- Template syntax and examples
- Supported formats
- Doc for developpers of lotemplate
- Unsolvable problems
- External documentations
- Versions
For Docker use of the API, you can skip this step.
- LibreOffice (the console-line version will be enough)
- python3.8 or higher
- python3-uno
- some python packages specified in requirement.txt that you can install with
pip install -r requirements.txt
.Flask
andWerkzeug
are optional, as they are used only for the API.
# on debian bookworm, you can use these commands
apt update
apt -y -t install bash python3 python3-uno python3-pip libreoffice-nogui
pip install -r requirements.txt
Run the following command on your server :
python3 -m flask run
or simply
flask run
or, for Docker deployment:
docker-compose up
# creation of a directory
curl -X PUT -H 'secret_key: my_secret_key' -H 'directory: test_dir1' http://lotemplate:8000/
# {"directory":"test_dir1","message":"Successfully created"}
curl -X PUT -H 'secret_key: my_secret_key' -H 'directory: test_dir2' http://lotemplate:8000/
# {"directory":"test_dir2","message":"Successfully created"}
# look at the created directories
curl -X GET -H 'secret_key: my_secret_key' http://lotemplate:8000/
# ["test_dir2","test_dir1"]
# delete a directory (and it's content
curl -X DELETE -H 'secret_key: my_secret_key' http://lotemplate:8000/test_dir2
# {"directory":"test_dir2","message":"The directory and all his content has been deleted"}
# look at the directories
curl -X GET -H 'secret_key: my_secret_key' http://lotemplate:8000/
# ["test_dir1"]
Let's imagine we have an odt file (created by libreoffice) like this :
Test document
let’s see if the tag $my_tag is replaced and this $other_tag is detected.
Upload this file to lotemplate
# upload a template
curl -X PUT -H 'secret_key: my_secret_key' -F file=@/tmp/basic_test.odt http://lotemplate:8000/test_dir1
{"file":"basic_test.odt","message":"Successfully uploaded","variables":{"my_tag":{"type":"text","value":""},"other_tag":{"type":"text","value":""}}}
# analyse an existing file and get variables
curl -X GET -H 'secret_key: my_secret_key' http://lotemplate:8000/test_dir1/basic_test.odt
# {"file":"basic_test.odt","message":"Successfully scanned","variables":{"my_tag":{"type":"text","value":""},"other_tag":{"type":"text","value":""}}}
# generate a file titi.odt from a template and a json content
curl -X POST -H 'secret_key: my_secret_key' -H 'Content-Type: application/json' -d '[{"name":"my_file.odt","variables":{"my_tag":{"type":"text","value":"foo"},"other_tag":{"type":"text","value":"bar"}}}]' --output titi.odt http://lotemplate:8000/test_dir1/basic_test.odt
After the operation, you get the file titi.odt with this content :
Test document
let’s see if the tag foo is replaced and this bar is detected.
Then use the following routes :
all routes take a secret key in the header, key secret_key
, that correspond to the secret key configured in the
.env file. If no secret key is configured, the secret key isn't required at request.
-
/
PUT
: take a directory name in the headers, key 'directory'. Creates a directory with the specified nameGET
: returns the list of existing directories
-
/<directory>
: directory correspond to an existing directoryGET
: returns a list of existing templates within the directory, with their scanned variablesPUT
: take a file in the body, key 'file'. Uploads the given file in the directory, and returns the saved file name and its scanned variablesDELETE
: deletes the specified directory, and all its contentsPATCH
: take a name in the headers, key 'name'. Rename the directory with the specified name.
-
/<directory>/<file>
: directory correspond to an existing directory, and file to an existing file within the directoryGET
: returns the file and the scanned variables of the fileDELETE
: deletes the specified filePATCH
: take a file in the body, key 'file'. replace the existing file with the given file. returns the file and the scanned variables of the filePOST
: take a json in the raw body. fills the template with the values given in the json. returns the filled document(s).
-
/<directory>/<file>/download
: directory correspond to an existing directory, and file to an existing file within the directoryGET
: returns the original template file, as it was sent
you may wish to deploy the API on your server. Here's how to do it - but don't forget that you should have soffice installed on the server
You can also change the flask options - like port and ip - in the .flaskenv file.
If you're deploying the app with Docker, port and ip are editable in the Dockerfile.
You can also specify the host and port used to run and connect to soffice as command line arguments,
or in a config file (config.yml
/config.ini
/config
or specified via --config).
Run the script with the following arguments :
usage: lotemplate_cli.py [-h] [--json_file JSON_FILE [JSON_FILE ...]]
[--json JSON [JSON ...]] [--output OUTPUT]
[--config CONFIG] [--host HOST] [--port PORT]
[--scan] [--force_replacement] template_file
positional arguments:
template_file Template file to scan or fill
optional arguments:
-h, --help show this help message and exit
--json_file JSON_FILE [JSON_FILE ...], -jf JSON_FILE [JSON_FILE ...]
Json files that must fill the template, if any
--json JSON [JSON ...], -j JSON [JSON ...]
Json strings that must fill the template, if any
--output OUTPUT, -o OUTPUT
Names of the filled files, if the template should
be filled. supported formats: pdf, html, docx, png, odt
--config CONFIG, -c CONFIG
Configuration file path
--host HOST Host address to use for the libreoffice connection
--port PORT Port to use for the libreoffice connexion
--scan, -s Specify if the program should just scan the template
and return the information, or fill it.
--force_replacement, -f
Specify if the program should ignore the scan's result
Args that start with '--' (e.g. --json) can also be set in a config file
(config.yml
/config.ini
/config
or specified via --config). Config file syntax allows: key=value,
flag=true, stuff=[a,b,c] (for details, see syntax).
If an arg is specified in more than one place, then commandline values
override config file values which override defaults.
All the specified files can be local or network-based.
Get a file to fill with the --scan
argument, and fill the fields you want. Add elements to the list
of an array to dynamically add rows. Then pass the file, and the completed json file(s) (using --json_file
)
to fill it.
Put $variable
in the document is enough to add the variable 'variable'.
A variable name is only composed by chars, letters or underscores.
You can also use "function variables". It is exactly the same as simple variables but with a syntax that allows you a more flexible variable name :
examples :
# simple variable
$my_var
$MyVar99_2020
# basic function variable
$my_var(firstName)
$my_var(address.city)
# you have to escape parenthesis inside the parameter in the
# variable name with a backslash
$my_var(firstName\(robert\))
Then in the json, function variable are working exactly like simple variables.
{
"my_var": {
"type": "text",
"value": "my value"
},
"MyVar99_2020": {
"type": "text",
"value": "my value"
},
"my_var(firstName)": {
"type": "text",
"value": "my value"
},
"my_var(address.city)": {
"type": "text",
"value": "my value"
},
"my_var(firstName\\(robert\\))": {
"type": "text",
"value": "my value"
}
}
Html variables are exactly like text variables, but the html is interpreted when the variable is replaced in the document.
To declare a variable as an html variable, we only have to change the type in the json and send "html" instead of "text".
{
"my_var": {
"type": "html",
"value": "my value with <strong>html formatting</strong>"
}
}
Limitation : Html is not interpreted into "shape content". For example for a text associated to a rectangle inserted into the document.
Add any image in the document, and put in the title of the alt text of the image (properties) '$' followed by the desired name ('$image' for example to add the image 'image')
You can add an unknown number of rows to the array but only on the last line. Add the dynamic variables in the last row of the table, exactly like text variables, but with a '&' instead of a the '$'
You can use if statement in order to display or to hide a part of your document.
There is many operators :
==
: test if the var is equal to the value (case-insensitive)!=
: test if the var is not equal to the value (case-insensitive)===
: test if the var is equal to the value (case-sensitive)!==
: test if the var is not equal to the value (case-sensitive)IS_EMPTY
: check if the var is empty (empty means empty or only spaces, tabs or newlines)IS_NOT_EMPTY
: check if the var is not empty (empty means empty or only spaces, tabs or newlines)CONTAINS
: check if the var contains the value (case-insensitive)NOT_CONTAINS
: check if the var does not contain the value (case-insensitive)
[if $my_var == my_value]
$my_var equals my_value (case insensitive)
[endif]
[if $my_var === my_value]
$my_var equals my_value (case sensitive)
[endif]
[if $my_var != my_value]
$my_var does not equals my_value (case insensitive)
[endif]
[if $my_var !== my_value]
$my_var does not equals my_value (case sensitive)
[endif]
[if $my_var CONTAINS y_VAl]
This part will be displayed if $my_var contains y_VAl (case insensitive)
[endif]
[if $my_var NOT_CONTAINS dlfksqjqm]
This part will be displayed if $my_var does not contain dlfksqjqm
[endif]
[if $my_var IS_EMPTY]
This part will be displayed if my_var is empty (empty means empty or only spaces, tabs or newlines)
[endif]
[if $my_var IS_NOT_EMPTY]
This part will be displayed if my_var is empty (empty means empty or only spaces, tabs or newlines)
[endif]
You can put an if statement inside another if statement
[if $foo == my_value]
[if $bar == my_value2]
here you have your document
[endif]
[if $bar == my_value3]
here you have your document
[endif]
[endif]
You can use for statement in order to display a part of your document multiple times.
WARNING : the for system loses the formating of the template. If you want a specific formating, you have to put it in an HTML statement.
You have to send an array in the json file with a dict inside the array
{
"tutu": {
"type": "array",
"value": [
{
"firstName": "perso 1",
"lastName": "string 1",
"address": {
"street1": "8 rue de la paix",
"street2": "",
"zip": "75008",
"city": "Paris",
"state": "Ile de France"
}
},
{
"firstName": "perso 2",
"lastName": "lastname with < and >",
"address": {
"street1": "12 avenue Jean Jaurès",
"street2": "",
"zip": "38000",
"city": "Grenoble",
"state": "Isère"
}
}
]
}
}
Then in your template file you can use the for like this :
Tests of for statements
[for $tutu]
Associate number [forindex]
first name : [foritem firstName]
last name escaped by default [foritem lastName]
last name escaped html [foritem lastName escape_html]
last name not escaped [foritem lastName raw]
Address :
[foritem address.street1]
[foritem address.zip] [foritem address.city]
[endfor]
[forindex]
: this is a counter beginning at 0 indicating the iteration count.[foritem firstName]
: variable firstName of the current iteration.[foritem lastName escape_html]
: variable lastName of the current iteration escaped by html.[foritem lastName raw]
: variable lastName of the current iteration not escaped.[foritem address.street1]
: variable address.street1 of the current iteration when you have a hierarchy
Note : you can use if inside for statements
Here we display only the people living in Grenoble
[for $tutu]
[if [foritem address.city] == Grenoble]
first name : [foritem firstName]
last name : [foritem lastName]
Address :
[foritem address.street1]
[foritem address.zip] [foritem address.city]
[endif]
[endfor]
Here we display only the first element of the array
[for $tutu]
[if [forindex] == 0]
first name : [foritem firstName]
last name : [foritem lastName]
[endif]
[endfor]
Note : If you are using [forindex]
inside a variable name, the variable
is excluded from the parsing of the template. It allows you to create a
dynamic variable name inside a for loop. Ex : $my_var(people.[forindex].name)
is
excluded from the variable parsing.
You can use html statement in order to display a part of your document with a specific formating.
Here is some examples that ca be use inside an odt template
[html]
<table>
<tr>
<td>First Name</td>
<td>Last Name</td>
</tr>
<tr>
<td>First Name</td>
<td>Last Name</td>
</tr>
</table>
[endhtml]
Then all the html content is interpreted and pasted as html. It is then rendered as a formated text (a table in this example) inside the document.
You can also display a variable that contains an html content inside an html statement
[html]
$my_html_variable
[endhtml]
with the associated json :
{
"tutu": {
"type": "text",
"value": "my <strong>html formated</strong> content"
}
}
As the for statement removes formatting, you can use the html statement combined with the for statement to display a table with a specific formating.
Let's see an example with the following json :
{
"tutu": {
"type": "array",
"value": [
{
"firstName": "perso 1",
"lastName": "string 1",
"address": {
"street1": "8 rue de la paix",
"street2": "",
"zip": "75008",
"city": "Paris",
"state": "Ile de France"
}
},
{
"firstName": "perso 2",
"lastName": "lastname with < and >",
"address": {
"street1": "12 avenue Jean Jaurès",
"street2": "",
"zip": "38000",
"city": "Grenoble",
"state": "Isère"
}
}
]
}
}
and the odt template :
[html]
<table>
<tr>
<td>First Name</td>
<td>Last Name</td>
</tr>
[for $tutu]
<tr>
<td>[foritem firstName escape_html]</td>
<td>[foritem lastName escape_html]</td>
</tr>
[endfor]
</table>
[endhtml]
You can use counter statement in order to display values that will be incremented step by step in your document. It can be used for example to have a heading automatic numbering.
WARNING : generally you don't have to use this counter feature. You can use the automatic numbering of Word or Libre Office
In your odt, you can use :
chapter [counter chapter] : introduction
chapter [counter chapter] : context
[counter paragraph] : geopolitical context
[counter paragraph] : geographical context
[counter paragraph] : economical context
chapter [counter chapter] : analysis
[counter.reset paragraph]
[counter paragraph] : geopolitical analysis
[counter paragraph] : geographical analysis
[counter paragraph] : economical analysis
It will be transformed in :
chapter 1 : introduction
chapter 2 : context
1 : geopolitical context
2 : geographical context
3 : economical context
chapter 3 : analysis
1 : geopolitical analysis
2 : geographical analysis
3 : economical analysis
The possible syntaxe are :
[counter counter_name]
: increment the counter "counter_name" and display it[counter counter_name hidden]
: increment the counter "counter_name" without displaying it[counter.reset counter_name]
: reset the counter "counter_name" to 0[counter.last counter_name]
: display the last value of the counter "counter_name" without incrementing it[counter.format counter_name format_name]
: change the format of the counter[counter.format counter_name number]
: the counter is displayed as a number (default)[counter.format counter_name letter_lowercase]
: the counter is displayed as a letter (a, b, c, ...)[counter.format counter_name letter_uppercase]
: the counter is displayed as a letter (A, B, C, ...)
chapter [counter chapter] : introduction
chapter [counter chapter] : context
[counter.last chapter].[counter paragraph] : geopolitical context
[counter.last chapter].[counter paragraph] : geographical context
[counter.last chapter].[counter paragraph] : economical context
chapter [counter chapter] : analysis
[counter.reset paragraph]
[counter.last chapter].[counter paragraph] : geopolitical analysis
[counter.last chapter].[counter paragraph] : geographical analysis
[counter.last chapter].[counter paragraph] : economical analysis
The result will be
chapter 1 : introduction
chapter 2 : context
2.1 : geopolitical context
2.2 : geographical context
2.3 : economical context
chapter 3 : analysis
3.1 : geopolitical analysis
3.2 : geographical analysis
3.3 : economical analysis
[counter.reset iterator]
[for $solutions]
Title : [foritem title]
Content : [foritem paragraph]
[counter iterator hidden]
[endfor]
we displayed [counter.last iterator] solutions
Format | ODT, OTT | HTML | DOC, DOCX | RTF | TXT | OTHER |
---|---|---|---|---|---|---|
text variables support | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
dynamic tables support | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
image variables support | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
odt, pdf, html, docx.
Other formats can be easily added by adding the format information in the dictionary formats
in
lotemplate/classes.py > Template > export().
Format information can be found on the unoconv repo.
You need to have docker and docker-compose installed and then run
make tests
for this we use fixuid (https://github.com/boxboat/fixuid)
you have to define two env variable MY_UID and MY_GID with your uid and gid copy docker-compose.override.yml.example to docker-compose.override.yml
export MY_UID=$(id -u)
export MY_GID=$(id -g)
cp docker-compose.override.yml.example docker-compose.override.yml
docker-compose up
The error UnoException
happens frequently and
unpredictably, and this error stops the soffice processus
(please note that the API try to re-launch the process by itself). This error, particularly annoying, is unfortunately
impossible to fix, since it can be caused by multiples soffice (LibreOffice) bugs.
Here is a non-exhaustive list of cases that can cause this bug :
- The soffice process was simply closed after the connection is established.
- The
.~lock.[FILENAME].odt#
file is present in the folder where the document is open. This file is created when the file is currently edited via libreoffice, and deleted when the programs in which it is edited are closed. The program try to avoid this error by deleting this file at document opening. - The first line of the document is occupied by a table or another dynamic element (just jump a line, it will solve the problem)
- The background of document is an image, and is overlaid by many text fields
- The document is an invalid file (e.g: the file is an image), and the bridge crashes instead of return the proper error.
The amount of memory used by soffice can increase with its use, even when open files are properly closed (which is the case). Again, this is a bug in LibreOffice/soffice that has existed for years.
For trying to fix these problems, you can try:
- Use the most recent stable release of LibreOffice (less memory, more stable, fewer crashes)
- JODConverter wiki for list formats compatibles with LibreOffice
- The unoconv source code, written in python with PyUNO
- Unoconv source code for list formats - and properties - compatible with LibreOffice for export
- OpenOffice Python Bridge information and code exemples
- com.sun.star Java API docs (On which pyuno is based - but is not identical)
- Java LibreOffice Programming Book
- Deploying Flask
- Flask documentation - quickstart
- Flask documentation - upload
- v1.6.1 : 2024-04-12 : bugfix
- fix the issue #34 : too many endif buggy
- v1.6.0 : 2024-04-11
- allow put variables inside headers and footers
- fix a bug when a variable is both inside the text content and inside a table (it should not arrive, but it is fixed)
- a new unit test system based on PDF converted to text in order to test contents that are not converted to text with a simple saveAs
- v1.5.2 : 2024-02-24 : Better README
- Rewrite for a betterdocker DockerFile without bug
- v1.5.1 : 2024-02-16 : Better README
- Rewriting of the README file
- v1.5.0 : 2024-02-12 : syntax error detection
- add syntax error detection in if statements
- add syntax error detection in for statements
- come back to default libreoffice of Debian Bookworm (removed backports, incompatibility)
- v1.4.1 : 2023-11-20 : micro-feature for counter and fix possible bug
- use counters for counting elements of a list
- fix possible bug with reset and last.
- v1.4.0, 2023-11-17 : counters
- add a counter system inside templates
- add better scan for if statement. Raises an error if there is too many endif in the template.
- speedup html statement replacement and scanning
- speedup for statement replacement and scanning
- tests of for scanning
- internal : add scan testing inside content unit tests
- v1.3.0, 2023-11-16 :
- major refactoring. No evolution for the user.
- new unit tests on tables and images
- no BC Break (theoretically)
- v1.2.8, 2023-09-01 :
- fix bug in TextShape var replacement
- v1.2.7, 2023-08-30 :
- Upgrade to debian bookworm slim
- v1.2.6, 2023-08-30 :
- new comparators for if statements : ===, !==, CONTAINS, NOT_CONTAINS
- variables of type "html" are now supported and copied as HTML
- v1.2.5, 2023-07-17 : temporary fix for detecting endhtml and endfor
- v1.2.4, 2023-07-09 : fix major bug in if statement scanning
- v1.2.3, 2023-07-07 : no endif detection, performance improvement in if statement
- v1.2.2, 2023-06-09 : bugfix html statement scan missing
- v1.2.1, 2023-06-05 : little fix for CI
- v1.2.0, 2023-06-04 : if statements inside for
- v1.1.0, 2023-05-23 : recursive if statement
- v1.0.1, 2023-05-05 : workaround, fix in html formatting
- v1.0.0, 2023-05-03 : if statement, for statement, html statement
- not numbered : about may 2022 : first version
- Possibly to add dynamic images in tables
- another way to make image variables that would be compatible with Microsoft Word and maybe other formats (example : set the variable name in the 'alternative text' field)
- key system for each institution for security
- handle bulleted lists using table like variables
- use variable formatting instead of the one of the character before