diff --git a/docs/modules/ROOT/pages/design-notes.adoc b/docs/modules/ROOT/pages/design-notes.adoc index b8f5e7677..b12c481e7 100644 --- a/docs/modules/ROOT/pages/design-notes.adoc +++ b/docs/modules/ROOT/pages/design-notes.adoc @@ -1,50 +1,90 @@ = Design Notes -== AST Traversal - -During the AST traversal stage, the complete AST (generated by the clang frontend) -is walked beginning with the root `TranslationUnitDecl` node. It is during this -stage that USRs (universal symbol references) are generated and hashed with SHA1 -to form the 160 bit `SymbolID` for an entity. With the exception of built-in types, -*all* entities referenced in the corpus will be traversed and be assigned a `SymbolID`; -including those from the standard library. This is necessary to generate the -full interface for user-defined types. - -== Bitcode - -AST traversal is performed in parallel on a per-translation-unit basis. -To maximize the size of the code base MrDocs is capable of processing, `Info` -types generated during traversal are serialized to a compressed bitcode representation. -Once AST traversal is complete for all translation units, the bitcode is deserialized -back into `Info` types, and then merged to form the corpus. The merging step is necessar - as there may be multiple identical definitions of the same entity (e.g. for class types, - templates, inline functions, etc), as well as functions declared in one translation - unit & defined in another. - -== The Corpus - -After AST traversal and `Info` merging, the result is stored as a map of `Info`s -indexed by their respective `SymbolID`s. Documentation generators may traverse -this structure by calling `Corpus::traverse` with a `Corpus::Visitor` derived -visitor and the `SymbolID` of the entity to visit (e.g. the global namespace). - -== Namespaces - -Namespaces do not have a source location. -This is because there can be many namespaces. -We probably don't want to store any javadocs for namespaces either. - -== Paths - -The AST visitor and metadata all use forward slashes to represent file -pathnames, even on Windows. This is so the generated reference documentation -does not vary based on the platform. - -== Exceptions - -Errors thrown by the program should always have type `Exception`. Objects -of this type are capable of transporting an `Error` object. This is important -for the scripting to work; exceptions are used to propagate errors from -library code to scripts and back to the invoking code. For exceptional cases, -these thrown exceptions should be uncaught. The tool installs an uncaught exception -handler that prints a stack trace and exits the process immediately. +== Why automate documentation? + +{cpp} API design is challenging. +When you write a function signature, or declare a class, it is at that moment when you are likely to have as much information as you will ever have about what it is supposed to do. +The more time passes before you write the documentation, the less likely you are to remember all the details of what motivated the API in the first place. + +In other words, because best and worst engineers are naturally lazy, the *temporal adjacency* of the {cpp} declaration to the documentation improves outcomes. +For this reason, having the documentation be as close as possible to the declaration is ideal. +That is, the *spatial adjacency* of the C++ declaration to the documentation improves outcomes because it facilitates temporal adjacency. + +In summary, the automated extraction of reference documentation from {cpp} declarations containing attached documentation comments is ideal because: + +* Temporally adjacent documentation is more comprehensive +* Spatially adjacent documentation requires less effort to maintain +* And causally connected documentation is more accurate + +From the perspective of engineers, however, the biggest advantage of automated documentation is that it implies a single source of truth for the API at a low cost. +However, {cpp} codebases are notoriously difficult to document automatically because of constructs where the code needs to diverge from the intended API that represents the contract with the user. + +Tools like Doxygen typically require workarounds and manual intervention via preprocessor macros to get the documentation right. +These workarounds are problematic because they effectively mean that there are two versions of the code: the well-formed and the ill-formed versions. +This subverts the single sources of truth for the code. + +|=== +| | Mr. Docs | Automatic | Manual | No Reference + +| No workarounds | ✅ | ❌ | ✅ | ❌ +| Nice for users | ✅ | ✅ | ✅ | ❌ +| Single Source of Truth | ✅ | ❌ | ❌ | ❌ +| Less Work for Developers | ✅ | ❌ | ❌ | ✅ +| Always up-to-date | ✅ | ✅ | ❌ | ❌ +|=== + +* By providing no reference to users, they are forced to read header files to understand the API. +This is a problem because header files are not always the best source of truth for the API. +Developers familiar with https://cppreference.com[cppreference.com,window=_blank] will know that the documentation there is often more informative than the header files. +* A common alternative is to provide a manual reference to the API. +Developers duplicate the signatures, which requires extra work. +This strategy tends to work well for small libraries and allows the developer to directly express the contract with the user. +However, as the single source of truth is lost, it becomes unmanageable as the codebase grows. +When the declaration changes, they forget to edit the docs, and the reference becomes out of date. +* In this case, it looks like the best solution is to automate the documentation. +The causal connection between the declaration and the generated documentation improves outcomes. +That's when developers face difficulties with existing tools like Doxygen, which require workarounds and manual intervention to get the documentation right. +The workarounds mean that there are two versions of the code: the well-formed and the ill-formed versions. +* The philosophy behind MrDocs is to provide solutions to these workarounds so that the single source of truth can be maintained with minimal effort by developers. +With Mr.Docs, the documented {cpp} code should be the same as the code that is compiled. + +== Customization + +Once the documentation is extracted from the code, it is necessary to format it in a way that is useful to the user. +This usually involves generating output files that match the content of the documentation to the user's needs. + +A limitation of existing tools like Doxygen is that their output formats are not very customizable. +It supports minor customization in the output style and, for custom content formats, it requires much more complex workflows, like generating XML files and writing secondary applications to process them. + +MrDocs attempts to support multiple output formats and customization options. +In practice, MrDocs attempts to provide three levels of customization: + +* At the first level, the user can use an existing generator and customize its templates and helper functions. +* The user can write a MrDocs plugin at a second level that defines a new generator. +* At a third level, it can use a generator for a structured file format and consume the output in a more uncomplicated secondary application or script. + +These multiple levels of complexity mean developers can worry only about the documentation content. +In practice, developers rarely need new generators and are usually only interested in changing how an existing generator formats the output. +Removing the necessity of writing and maintaining a secondary application only to customize the output via XML files can save these developers an immense amount of time. + +== {cpp} Constructs + +To deal with the complexity of {cpp} constructs, MrDocs uses clang's libtooling API. +That means MrDocs understands all {cpp}: if clang can compile it, MrDocs knows about it. + +Here are a few constructs MrDocs attempts to understand and document automatically from the metadata: + +* Overload sets +* Private APIs +* Unspecified Return Types +* Concepts +* Typedef / Aliases +* Constants +* Automatic Related Types +* Macros +* SFINAE +* Hidden Base Classes +* Hidden EBO +* Niebloids +* Coroutines +