Purpose
- Build a program that generates a Table of Contents file of all the folder's (recursive) files
Principles
- Generated links can be configured to work with other source-code-hosting sites (e.g. BitBucket) or other URL formats for section headers / line numbers
- The Table of Contents output can be rendered using different formats (e.g. Markdown, HTML, etc.)
- Markdown
- Nested levels could be rendered via nested bullet points
- HTML
- Nested levels could be rendered via nested bullet points or using nested
<details><summary> ... </details>
HTML tags
- Nested levels could be rendered via nested bullet points or using nested
- Markdown
Outcome 1
- When running the program, the user can easily configure which foldlers/files to include/exclude
- For folders, we can use a blacklist filter: if a folder name matches an excluded folder (e.g.
output/
,.spago
, etc.), we'll skip it and all of its contents. - For files, we can use a whitelist filter: if a file's extension matches a specific one, we should include it. For our purposes, we're only concerned about Markdown, PureScript, and JavaScript files (ie.g.
*.md
,*.purs
, and*.js
).
- For folders, we can use a blacklist filter: if a folder name matches an excluded folder (e.g.
- The program outputs or overwrites the ToC file with content (see below for example). When a URL is clicked, it will open the corresponding page for this repo on GitHub...
- Why GitHub and not something else? Because this repo is hosted on GitHub
- Thus, the links that are generated will need to work with GitHub's link specifications:
- For markdown files, this will use the regular markdown format (i.e.
file-name.md#section-header
). - For PS files (e.g. Syntax folder), it will highlight the corresponding section's header by referring to its line in that file (i.e.
file-name.purs#L4
).
- For markdown files, this will use the regular markdown format (i.e.
- ...using a tag.
- The ToC should only be updated when a new release is made. Thus, the
development
branch's ToC file will always be outdated. Likewise, thelatestRelease
branch will be correct only for one release. When a new major release is made, previous releases would no longer refer to the correct pages if files' names or locations had changed, leading to broken links. - Thus, the links should refer to a tag on GitHub rather than a branch name or a commit hash, so that each release always refers to itself as the version to use for finding files.
- The ToC should only be updated when a new release is made. Thus, the
The format of the ToC should look like this:
An example using current project setup
- ReadMe
- 01-Tool-Comparisons
- 02-CLI-Options
- 03-New-Projects-From-Start-To-Finish.md
We'll first define the types we'll be using throughout our program. When I thought about this particular 'domain', I came up with these concepts:
| Type Name | Meaning | Implementation
| - | - | - | - |
| HeaderInfo
| All the info needed to render one Markdown header: the header's level, the header's text as a human-readable format (e.g. "This is a header"), and the header's text as an anchor-link to be appended to the file's name (e.g. "#this-is-a-header"). | { level :: Int, text :: String, anchor :: String }
| Tree HeaderInfo
| Stores a single level-2 header and all of its nested headers | --
| List (Tree HeaderInfo)
| Stores a file's possibly-empty list of level-2 header trees | --
| -- | -- | --
| WebUrl
| A String that represents a website URL | type WebUrl = String
| FilePath
| A String that represents a file path | type FilePath = String
| UriPath
| An efficient top-down way to store both the current path's file system path and its corresponding URL path | type UriPath = { fs :: FilePath, url :: WebUrl }
| -- | -- | --
| GitHubRepo
| The data we need to be able to produce the root URL to the corresponding GitHub repo at a specific reference (branch, tag, commit) | { username :: String, project :: String, ref :: String }
Our program consists of six stages
- Recursively walk the file system, starting at the root directory.
- In the first level, only include the top-level directories.
- In the levels that follow, only include the directories and files we specify.
- When we reach a file that should be included, render it by...
- verifying that it's corresponding URL returns a 200 HTTP code
- attempting to parse the file for any headers
- Render the file and its headers using the validated URL
- Return to the parent directory
- When a directories' paths have all been rendered, render it by...
- Rendering all of its chid paths' content
- Rendering the directory and appending the child paths' rendered content
- Return to the parent directory
- When a top-level directory's paths has been rendered, render it by...
- Rendering all of its chid paths' content
- Rendering the top-level directory and appending the child paths' rendered content
- Return to the root directory
- When all top-level directories have been rendered, add the "table of contents" section that comes before their content
- Write the final content to the output file.
The following image is the general flow of the program:
In addition to the above, we'll also have a parser and a renderer. Both will be provided via the Env
type, so that we can easily use a different parser approach or a different output type.
If I implement a parser that parses markdown headers in a nested manner, it will get very complicated and will likely not make as much sense to newer people. It will also take more time and thought to do. So, to keep things simple, we'll parse content using a two-step approach:
- parse a file for all of its headers using a line-by-line approach
- iterate through the list of headers and nest them correctly using a fold.
For this project's goals, we'll only provide a Markdown renderer. However, we could also provide an HTML renderer.
The "effects" or "capabilities" we need to run this program are relatively simple:
- The capability to read a path on the file system and determine whether it is a file or a directory
- The capability to read a directory via its path and return back all of its children as file paths
- The capability to read a file's contents
- The capability to write content to a file
- The capability to log messages to the user to help them see what is going on
The above capabilities require access to a file system on a computer. Implementing a CLI program via Node and Yargs makes this simple. We could implement a Halogen-based interface, but that would require making a server-client architecture. That's too complex for now, but we may implement it at a future time later:
- The capability to read a path on the file system and determine whether it is a file or a directory
Node.FS.Aff
- The capability to read a directory via its path and return back all of its children as file paths
Node.FS.Aff
- The capability to read a file's contents
Node.FS.Aff
- The capability to write content to a file
Node.FS.Aff
- The capability to log messages to the user to help them see what is going on
Console.Log
- just print things to the screenNode.FS.Aff
- log things to a file viaappendFile
Since PureScript could be compiled to a non-JavaScript backend, I'll also attempt to structure the code, so that it is backend-independent. If we want to switch to something other than Node, we will only need to re-implement the AppM
and Infrastructure
levels
- Production: we'll use
Aff
to run the Node-based program - Test: (still a work in progress)