added export as HTML or PDF

dcuartielles · dcuartielles · commit fbedc77a0d30 · 2022-02-28T23:41:05.000+01:00
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Bash Site Generator
 
-The **Bahs Site Generator** is a piece of software to help in the creation of courses which will be rendered as **Markdown** files, including an index, etc.
+The **Bash Site Generator** is a piece of software to help in the creation of courses which will be rendered as **Markdown** files, including an index, etc.
 
 The generator is a collection of scripts that can be executed step by step to:
 
diff --git a/build/config/config.conf b/build/config/config.conf
@@ -21,6 +21,12 @@ NUM_PAGES=0
 VERBOSE=0
 DEFAULT_TEMPLATE="basic1"
 
+## Export
+TYPE_FILE="HTML"
+EXPORT_FOLDER="export"
+TEMP_EXPORT_FILE="export.md"
+PDF_EXPORT_FILE="site.pdf"
+
 ## Data handling
 SEPARATOR='¤'
 PARAMETER_SEPARATOR=','
@@ -45,3 +51,4 @@ FIELD_NAMES=("OBJECTIVES" "INTRODUCTION" "PARTS" "VIDEO" "DESCRIPTION" "CIRCUIT"
 FIELD_TYPES=("text" "text" "text" "video" "text" "image" "code" "text" "license")
 ##FIELD_PROPERTIES=("true,text" "true,text" "true,text" "true,video" "true,text" "true,image,local" "true,code,C,true" "true,text" "true,license")
 DEFAULT_CONTENT=("${EMPTY_STR}" "${EMPTY_STR}" "${EMPTY_STR}" "${EMPTY_VID}" "${EMPTY_STR}" "img.png" "CodeListing" "${EMPTY_STR}" "${LICENSE_EMBED}")
+EXPORT_TYPES=("HTML" "PDF")
diff --git a/build/export_site.sh b/build/export_site.sh
@@ -0,0 +1,168 @@
+#!/bin/bash
+
+## 20220225 This export script takes the rendered website and uses pandoc
+## to convert to other formats such as HTML or PDF
+
+## The possible parameters are:
+## ** -l: locale (default en)
+## ** -t: type of file (default HTML)
+## ** -r: render folder (default .. a.k.a. root)
+## ** -p: pandoc parameters
+## ** -e: export folder (default export)
+
+## Load utils file
+UTILS_PATH='./utils.sh'
+
+## Include the functions
+source "${UTILS_PATH}"
+
+## Load config file (https://wiki.bash-hackers.org/howto/conffile#secure_it)
+CONFIG_PATH='./config/config.conf'
+
+## Check for malformed config instructions
+configReturn=(checkConfigFile)
+
+## Otherwise go on and source it:
+source "${CONFIG_PATH}"
+
+while getopts ":l:t:r:p:e:" opt; do
+  case $opt in
+    l) LOCALE="$OPTARG"
+    ;;
+    t) TYPE_FILE="$OPTARG"
+    ;;
+    r) RENDER_FOLDER="$OPTARG"
+    ;;
+    p) PANDOC_PARAMETERS="$OPTARG"
+    ;;
+    e) EXPORT_FOLDER="$OPTARG"
+    ;;
+    \?) echo "Invalid option -$OPTARG" >&2
+    exit 1
+    ;;
+  esac
+
+  case $OPTARG in
+    -*) echo "Option $opt needs a valid argument"
+    exit 1
+    ;;
+  esac
+done
+
+## Folders used
+LOCALE_FOLDER=${LOCALE}
+CURRENT_FOLDER="${RENDER_FOLDER}/${SITE_FOLDER}/${LOCALE_FOLDER}"
+CURRENT_EXPORT_FOLDER="${EXPORT_FOLDER}/${SITE_FOLDER}/${LOCALE_FOLDER}"
+IMAGE_FOLDER="${RENDER_FOLDER}/${IMG_FOLDER}"
+
+## Data and index files
+INDEX_FILE="${CURRENT_FOLDER}/${SITE_INDEX}"
+
+## Step 1: check the type of file to work with, exit if not in the list
+typeExists=$(elementInWhere "${TYPE_FILE}" "${EXPORT_TYPES[@]}")
+[[ $? -ne 0 ]] && { echo "$TYPE_FILE file format not supported"; exit 99; }
+
+## Step 1.1: Copy the render folder to a temporary one to ease the work
+## But first, create the folder if it doesn't exist
+[ ! -d "${TEMP_FOLDER}" ] && mkdir ${TEMP_FOLDER}
+[ ! -d "${TEMP_FOLDER}/${SITE_FOLDER}" ] && mkdir ${TEMP_FOLDER}/${SITE_FOLDER}
+cp -R ${CURRENT_FOLDER} ${TEMP_FOLDER}/${SITE_FOLDER}
+cp -R ${IMAGE_FOLDER} ${TEMP_FOLDER}
+
+## Since you are at it, create the export folder and add the images there
+[ ! -d "${EXPORT_FOLDER}" ] && mkdir ${EXPORT_FOLDER}
+[ ! -d "${EXPORT_FOLDER}/${SITE_FOLDER}" ] && mkdir ${EXPORT_FOLDER}/${SITE_FOLDER}
+[ ! -d "${CURRENT_EXPORT_FOLDER}" ] && mkdir ${CURRENT_EXPORT_FOLDER}
+cp -R ${IMAGE_FOLDER} ${EXPORT_FOLDER}
+
+## Step 2: if HTML, search for all Markdown files and convert them one by one
+if [[ ${TYPE_FILE} == "HTML" ]]; then
+  ## List all of the Markdown files into an array
+  markdownFiles=$(find "${TEMP_FOLDER}" | grep "\.md$")
+  readarray -t MARKDOWN_FILES <<<"$markdownFiles"
+
+  ## Iterate through the array
+  for fileIndex in "${!MARKDOWN_FILES[@]}"
+  do :
+    echo -e "** Converting: ${MARKDOWN_FILES[$fileIndex]}"
+
+    ## Extract file name without extension
+    filename=$(basename -- "${MARKDOWN_FILES[$fileIndex]}")
+    filename="${filename%.*}"
+    ## extension="${filename##*.}"
+    ##echo -e "** Filename: ${filename}"
+
+    ## Extract the path to the file
+    fileFolder=$(dirname "${MARKDOWN_FILES[$fileIndex]}")
+
+    ## Remove the first folder block
+    fileFolder=${fileFolder#$TEMP_FOLDER}
+
+    ## Add the export folder name
+    fileFolder=$EXPORT_FOLDER/$fileFolder
+
+    ## Fix the index block to link to HTML files
+    ## TODO: this will not allow me to include links to ANY Markdown Files
+    ##       It could be a good idea to rethink this in the future
+    ##if [[ "${filename}.md" == "${SITE_INDEX}" ]]; then
+      sed -i "s/.md)/.html)/g" "${MARKDOWN_FILES[$fileIndex]}"
+    ##fi
+
+    ## Create the page folder, this trick allows two levels of depth
+    ## such as the created by the generator, having more depth would
+    ## require a different subfolder generator capable of detecting how
+    ## many subfolders to create along the way
+    [ ! -d "${fileFolder}" ] && mkdir ${fileFolder}
+    ##[ ! -d "${CURRENT_EXPORT_FOLDER}/${filename}" ] && mkdir ${CURRENT_EXPORT_FOLDER}/${filename}
+
+    pandoc -s ${MARKDOWN_FILES[$fileIndex]} --metadata pagetitle="${filename}" -o ${fileFolder}/${filename}.html
+    ##pandoc ${MARKDOWN_FILES[$fileIndex]} -o ${CURRENT_EXPORT_FOLDER}/${filename}/${filename}.html
+  done
+fi
+
+## Step 3: if PDF, remove the indexing part of all files and compose them together
+if [[ ${TYPE_FILE} == "PDF" ]]; then
+  ## List all of the Markdown files into an array
+  markdownFiles=$(find "${TEMP_FOLDER}" | grep "\.md$")
+  readarray -t MARKDOWN_FILES <<<"$markdownFiles"
+
+  ## Iterate through the array
+  for fileIndex in "${!MARKDOWN_FILES[@]}"
+  do :
+    ## Work only if not dealing with the index file, which should be
+    ## left out of the PDF generation
+
+    ## Extract file name without extension
+    filename=$(basename -- "${MARKDOWN_FILES[$fileIndex]}")
+    filename="${filename%.*}"
+
+    if [[ "${filename}.md" != "${SITE_INDEX}" ]]; then
+      echo -e "** Adding: ${MARKDOWN_FILES[$fileIndex]}"
+
+      ## Append the file at the bottom of the temp file
+      cat "${MARKDOWN_FILES[$fileIndex]}" >> "${TEMP_FOLDER}/${TEMP_EXPORT_FILE}"
+    fi
+  done
+
+  ## Remove all of the index blocks
+  ## TODO: fix this to work with other locales by using information from the
+  ##       template.csv file instead of having it hardcoded in here
+  ##sed -z -i 's/## Index\(.*\)##/##/g' "${TEMP_FOLDER}/${TEMP_EXPORT_FILE}"
+  sed -i '/## Index/,/##/{//!d}' "${TEMP_FOLDER}/${TEMP_EXPORT_FILE}"
+  sed -z -i 's/## Index//g' "${TEMP_FOLDER}/${TEMP_EXPORT_FILE}"
+
+  ## TODO: deal with the video blocks by searching for a screenshot
+
+  pandoc "${TEMP_FOLDER}/${TEMP_EXPORT_FILE}" --pdf-engine=xelatex -o "${EXPORT_FOLDER}/${PDF_EXPORT_FILE}"
+
+  ## Open the file for inspection
+  xdg-open "${EXPORT_FOLDER}/${PDF_EXPORT_FILE}"
+fi
+
+## XXX
+
+## We finished, delete the temporary folder
+[ -d "${TEMP_FOLDER}" ] && rm -fR ${TEMP_FOLDER}
+
+## Done
+echo -e "** Done"
diff --git a/build/render_site.sh b/build/render_site.sh
@@ -48,8 +48,6 @@ while getopts ":l:r:c:f:" opt; do
   esac
 done
 
-
-
 ## Folders used
 LOCALE_FOLDER=${LOCALE}
 CURRENT_FOLDER="${SITE_FOLDER}/${LOCALE_FOLDER}"
diff --git a/docs/BUILD.md b/docs/BUILD.md
@@ -1,5 +1,11 @@
 # Build the project
 
+## Install dependencies
+
+If you are willing to export your sites to anything that is not Markdown, you will need to use the *pandoc* universal translation tool. It will allow converting the generator's outputs into HTML, PDF, LaTeX, amongst others.
+
+`sudo apt install pandoc`
+
 ## Build the templates
 
 Prior to creating a course, you need to define the templates you will be using. The call for the template generation goes as follows:
@@ -47,18 +53,34 @@ You need to add your content to the *en/pages.csv* file. Make sure you include t
 
 Once the templates have been created, render the markdown of the site by simply calling:
 
-`./render_site.sh en .. config pages.csv`
+`./render_site.sh -l en -r .. -c config -f pages.csv`
 
 inside the *build* folder. It will create all of the folders, *MD*, and code files based on templates. From there you will have the opportunity of modifying the content once more. Use the editor of your choice.
 
 The parameters for the *render_site.sh* script are:
 
-* locale: en, es, etc.
-* name of the folder with templates and the like, typically *config*
-* name of the *CSV* file containing the course
+* l, locale: en, es, etc.
+* r, render: folder where the site will be rendered into
+* c, config: name of the folder with templates and the like, typically *config*
+* f, file: name of the *CSV* file containing the course
 
 You will have to call it once per locale, which also means you should need the actual *CSV* file for the corresponding language.
 
+## Export your site to other formats
+
+The way information is built in the form of Markdown files requires some considerations depending on which kind of outputs you might expect from the generator. For example, if you are looking at having an HTML site, you will have to convert each one of the *\*.md* files separately, index included. On the other hand, if you are looking at making a printing-ready PDF, you might have to download all of the images, compose everything into a single file, and then call the *pandoc* script with the options to generate a *TOC* (table of contents).
+
+The export script is wrapping the call to *pandoc* after making some preparation work with the file. Call it as follows:
+
+`./export_site.sh -t HTML -r ..`
+
+The parameter for the *export_site.sh* script are:
+
+* l, locale: en, es, etc.
+* t, type of output: HTML, PDF
+* r, render: folder where the site was rendered into
+* p, pandoc parameters: a string containing the non-default settings for *pandoc*
+
 ## Note
 
 * The *CSV* separator in use is the **¤** symbol in order to have commas within the text
diff --git a/docs/LOGS.md b/docs/LOGS.md
@@ -6,6 +6,37 @@ Read here the full design logs, day by day, for the Bash Site Creator from the d
 
 See what happened in the second month of the year.
 
+### 20220227: added PDF export to export_site
+
+* included the *PDF* export feature
+* TODO: include offline images for both *HTML* and *PDF* exports   
+* TODO: include screencap of video file for PDFs + clickable URL
+
+### 20220226: added index correction to export_site in HTML
+
+* *export_site.sh* is now looking into all of the files and linking to *HTML* files instead of Markdown ones, this removes the possibility of having files with the extension *\*.md*
+* TODO: create a nice CSS for this ... maybe extract the one from Github pages
+
+### 20220225: scripting pandoc
+
+* *pandoc* is now officially a dependency to this project, therefore I have updated *BUILD.md* to include dependencies
+* created *export_site.sh*, a script to wrap the operations with *pandoc* after making all the needed preparations to the Markdown files
+* worked out the basic operations for exporting as *HTML*
+
+### 20220224: testing pandoc
+
+* *pandoc* has some serious superpowers, it can use your own CSS, it can generate the TOC in a PDF, etc
+* this also means that, after some testing, I will need to produce a tool which will be searching for materials and produce a preliminary file that could be sent to *pandoc*, e.g. I cannot put a video in a PDF, I will have to extract a screenshot of the video and add the clickable URL as a caption. Yet another example, I might have to locally download remote images with *Curl* prior to producing a compressed file in HTML with all of the assets for offline distribution of the content (which is one of my goals)
+* PDF files should be created from a single massive Markdown file, where contents should be sorted for *pandoc* to create the TOC
+* rendering HTML will require making my own tool looking for all *\*.md* files in a folder structure to be rendered as HTML
+
+### 20220223: research on exporting
+
+* I want mainly two different kinds of exports: HTML (in order to make the system independent from Github when publishing), and PDF (because a lot of people use it to distribute materials to their students). I came to just two possible solutions: *cmark* and *pandoc*
+* [*cmark*](https://github.com/commonmark/cmark) is a C tool, thus the fastest one currently existing capable of converting Markdown to HTML, it is considered the standard tool to benchmark your own against. The issue is that it does not produce PDFs
+* [*pandoc*](https://pandoc.org/) is the Swiss knife of format conversions from CLI, does not only HTML and PDF, but also LaTeX, AsciiDoc, Docx, etc.
+* I think I have to sacrifice performance for versatility, mainly because I will otherwise have to produce my own PDF export tool. The disavantage is obvious: I will need Linux to run my tool (which I do anyway, since I made things in *bash* ¯\\_(ツ)\_/¯)
+
 ### 20220222 (or 22022022): pre-alpha Twosday release
 
 * this is the very first release, pre-alpha 0.0.1 of the **Bash Site Generator**, it has been two months of work to make it to this point with small developments day after day