Skip to content

Extension Refactor and Redesign

mmcky edited this page Jan 30, 2020 · 2 revisions

January 2020

An update to thinking on the refactor and redesign to document some decisions made.

We have two streams of Translators:

  1. Code Block (Sparse Translators) -> we are providing a SphinxSparseTranslator
  2. Notebook (SphinxTranslators)

JupyterIPYNBTranslator will be setup as the base Translator to provide a notebook that contains markdown and code-block elemeents. This translator acts as a parent class to PDF, and HTML which override the methods required to produce notebooks suitable for website and pdf construction.

Code Execution is taken care of by a new execute builder. We have implemented a codetree to provide caching for code execution prior to the translation stage. All translators are able to get output attached to the notebook objects from the execute builder.

November 2019

This page documents some of the decisions and ideas for the upcoming refactor and redesign of the extension. The aim of this work will be to use the lessons learnt from past decisions and refactor the extension into a more logical design that will be easier to maintain in the long term. The aim of this work will be to release version 1.0.

Current Docs: https://sphinxcontrib-jupyter.readthedocs.io/en/latest/

Builders

  1. jupyter -> Targets readable Jupyter notebooks (with options to support more advanced features such as tables with html)
  2. jupyterhtml -> Targets the construction of a website (html pages, download notebooks, coverage badges, html themes)
  3. jupyterpdf -> Targets the construction of PDF files via LaTeX

each builder will have its own entry point and will target different folders in _build and each builder will have its own translator to keep the pathways logically separate. This refactoring will greatly reduce the number of required options in conf.py as each builder is a specialised compilation pipeline. Emphasis will be on reducing requried configuration (for example we can enforce a theme that requires html and pdf templates to be in a specific location).

Discussion/Questions:

  1. Should we have jupytercoverage or jupytertest for coverage execution testing to support jupyterhtml and/or as a standalone tool for reporting execution errors.
  2. Jupyter is used as an intermediate format only as it provides an execution layer. One idea would be to rework code-block execution at the sphinx transform layer to alleviate the need to use Jupyter in this way. However one benefit of using it as an intermediate layer is that we could support Jupyter sources easily which is a big plus.

Jupyter Notebook Construction (jupyter)

The primary use case that the jupyter builder should be able to support include constructing readeable notebooks with an emphasis on markdown inclusion. These can then be used for generating notebook sets for tutorials, lectures and courses.

Option Description
jupyter_conversion_mode all or code
jupyter_static_path Specify static file path
jupyter_language Specify default programming language (python3)
jupyter_language_synonyms parsing blocks for python3 and ipython
jupyter_solution_notebooks Build solution notebooks that include tagged solutions for code-blocks

or

jupyter = {
    static_path : <path for static folder>,
    conversion_mode : 'all',
    language : 'python3'
    language_synonyms: ['ipython'],
}

Notes:

  1. jupyter_static_path won't be needed if we build a library of static assets (see discussion below)
  2. Remove jupyter_header_block?
  3. Remove jupyter_language and infer? Perhaps combine with jupyter_kernel = python3
  4. jupyter_course_solutions is different to current implementation to drop solutions as that approach requires two runs of sphinx to be desired collections.

Should general options be specified at this level such as author:

Option Description
jupyter_author

Consider A: Lecture/Course Support

Redesign course / lecture support to be more specialised and useful and perhaps include as a separate builder jupyterlectures. Options such as jupyter_drop_solutions would not be required, if solutions class is found then it would generate two sets of notebooks 1. base set for lecture, and 2. a solutions set which includes solution cells.

References

  1. https://sphinxcontrib-jupyter.readthedocs.io/en/latest/config-extension-notebooks.html

Website Construction (jupyterhtml)

The primary use case that the jupyterhtml builder should be able to support generation of websites (using Jupyter) as an intermediate format.

Option Description
jupyterhtml_template Specify conversion template
jupyterhtml_downloadnb Generate download notebooks
jupyterhtml_downloadnb_urlpath Specify online server for internet based assets

or

jupyterhtml = {
    theme    : 'minimal',
    download_notebooks : True/False,
    download_notebooks_urlpath : <html path to server assets for images>,
}

Notes:

  1. items like jupyter_generate_html, jupyter_make_site, will not be required when it is a specialised build pathway.
  2. template paths not required if we enforce a theme structure for jupinx projects

Writers

Update writers to use sphinx.util.docutils base class for Translators. https://github.com/sphinx-doc/sphinx/blob/8c7faed6fcbc6b7d40f497698cb80fc10aee1ab3/sphinx/util/docutils.py#L429

We will have:

  1. JupyterCodeTranslator -> code-only
    • JupyterTranslator -> support for full ipynb representation of rst
    • JupyterHTMLTranslator -> html
    • JupyterPDFTranslator -> pdf

References

  1. https://sphinxcontrib-jupyter.readthedocs.io/en/latest/config-extension-html.html

PDF Construction (jupyterpdf)

The primary use case that the jupyterpdf builder should be able to support generation of PDF files (using Jupyter) as an intermediate format. This includes individual PDF files of each RST File as well as a book style PDF of the whole project.

Option Description
jupyterpdf_template Specify path to conversion template
jupyterpdf_bibfile Specify path to bibfile location
jupyterpdf_author
jupyterpdf_urlpath Specify urlpath for external links for externally hosted content

or

jupyterpdf = {
    template : <path>,
    bibfile : <path>,
    urlpath : <path>,
}

Notes:

  1. Should jupyterpdf_author be handled as jupyter_author?
  2. It is a bit strange to require usage of theme here as none of the theme is useful except for the pdf conversion template. Perhaps the pdf template should be divorced from the html theme? Or at the very least we should specify pdf template path rather than a theme.

References

  1. https://sphinxcontrib-jupyter.readthedocs.io/en/latest/config-extension-pdf.html

Design Ideas

Different Builders: (Accepted)

Refactoring into the different builders will alleviate the current confusion around setting options and their effect. Another option set we could consider is to use a pipeline option approach for collections of options. Related Issues https://github.com/QuantEcon/sphinxcontrib-jupyter/issues/199.

We will minimise code duplication across the different translators through inheritance from a base class (as all require code-block handling):

Translator Classes

`JupyterCode` 
  -> `JupyterNotebook`, `JupyterHTML`, `JupyterPDF`

Notebook Executor: (Develop)

We should write notebook execution as a utility that all classes can use to manage notebook execution in a consistent way. The utility will rely on dask[distributed]. We can then add cached notebook execution based on content changes. It would be nice if it can support:

  1. parsing output blocks for error handling and testing at the cellblock level.

Static Asset Managment: (Develop)

Management of static assets needs to be researched to see how we can integrate more closely with Sphinx internal management of ref and uri objects in the document tree. We should make use of sphinx as much as possible for managing these link types catering to:

  1. flat stuctures
  2. nested folder structures
  3. ability to have local static folders for lecture series use case

Pandoc: (Consider)

Investigate greater use of pandoc directly for conversions? Is it useful to consider converting RST to MARKDOWN via pandoc as it may make IPYNB -> HTML conversion easier?

Installable Themes: (Consider)

If themes and templates were installable this would save on user configuration requirements in theme and/or templates. [Low Priority]

Experimental Ideas for Improvements

Execution:

Investigate transforms as an opportunity to develop an execution engine for each code-block. It might be useful to build DAG representation of the various target types (i.e. jupyter and coverage notebooks) to greatly reduce the amount of computation required across the various pipelines. If this was possible we could support code-block execution and then target html, latex and jupyter natively through their respective writers with additional code to handle interfaces to executed code-blocks.

http://docutils.sourceforge.net/docs/ref/transforms.html

Review collaborative opportunities with: https://jupyter-sphinx.readthedocs.io/en/latest/, https://github.com/jupyter/jupyter-sphinx